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ABSTRACT:  We  present  a  model  of  similarity-based  retrieval  that  attempts  to  capture 
three  seemingly  contradictory  psychological  phenomena:  (1)  structural  commonalities  are 
weighed  more  heavily  than  surface  commonalities  in  similarity  judgments  for  items  in 
working  memory;  (2)  in  retrieval,  superficial  similarity  is  more  important  than  structural 
similarity;  and  yet  (3)  purely  structural  (analogical)  remindings  are  sometimes 
experienced.  Our  model,  MAC/FAC,  explains  these  phenomena  in  terms  of  a  two-stage 
process.  The  first  stage  uses  a  computationally  cheap,  non- structural  matcher  to  filter 
candidate  long-term  memory  items.  It  uses  content  vectors ,  a  redundant  encoding  of 
structured  representations  whose  dot  product  estimates  how  well  the  corresponding 
structural  representations  will  match.  The  second  stage  uses  SME  to  compute  structural 
matches  on  the  handful  of  items  found  by  the  first  stage.  We  show  the  utility  of  the 
MAC/FAC  model  through  a  series  of  computational  experiments:  (1)  We  demonstrate 
that  MAC/FAC  can  model  patterns  of  access  found  in  psychological  data,  (2)  We  argue 
via  sensitivity  analyses  that  these  simulation  results  rely  on  the  theory,  and  (3)  We 
compare  the  performance  of  MAC/FAC  with  ARCS,  an  alternate  model  of  similarity- 
based  retrieval,  and  demonstrate  that  MAC/FAC  explains  the  data  better  than  ARCS. 
Finally,  we  discuss  limitations  and  possible  extensions  of  the  model,  relationships  with 
other  recent  retrieval  models,  and  place  MAC/FAC  in  the  context  of  other  recent  work  on 
the  nature  of  similarity. 
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1.  Introduction 

Similarity-based  remindings  range  from  the  sublime  to  the  stupid.  At  one  extreme,  seeing  the  periodic  table 
of  elements  reminds  one  of  octaves  in  music.  At  the  other,  a  bicycle  reminds  one  of  a  pair  of  eyeglasses. 
Often,  remindings  are  neither  brilliant  nor  superficial  but  simply  mundane,  as  when  a  bicycle  reminds  one 
of  another  bicycle.  Theoretical  attention  is  inevitably  drawn  to  spontaneous  analogy:  i.e.,  to  structural 
similarity  unsupported  by  surface  similarity,  as  in  the  octave/periodic  table  comparison.  Such  remindings 
seem  clearly  insightful  and  seem  linked  to  the  creative  process,  and  should  be  included  in  any  model  of 
retrieval.  But  as  we  review  below,  research  on  the  psychology  of  memory  retrieval  points  to  a 
preponderance  of  the  latter  two  types  of  similarity  —  (mundane)  literal  similarity,  based  on  both  structural 
and  superficial  commonalities  --  and  (dumb)  superficial  similarity,  based  on  surface  commonalities.  A 
major  challenge  for  research  on  similarity-based  reminding  is  to  devise  a  model  that  will  produce  chiefly 
literal-similarity  and  superficial  remindings,  but  still  produce  occasional  analogical  remindings. 

A  further  constraint  on  models  of  access  comes  from  considering  the  role  of  similarity  in  transfer  and 
inference.  The  large  number  of  superficial  remindings  indicates  that  retrieval  is  not  very  sensitive  to 
structural  soundness.  But  appropriate  transfer  requires  structural  soundness,  so  that  knowledge  can  be 
exported  from  one  description  into  another.  And  psychological  evidence  (also  discussed  below)  indicates 
that  the  mapping  process  involved  in  transfer  is  actually  very  sensitive  to  structural  soundness.  Hence  our 
memories  often  give  us  information  we  don't  want,  which  at  first  seems  somewhat  paradoxical.  Any  model 
of  retrieval  should  explain  this  paradox. 

This  paper  presents  MAC/FAC,  a  model  of  similarity-based  reminding  that  attempts  to  capture  these 
phenomena.  MAC/FAC  models  similarity-based  retrieval  as  a  two-stage  process.  The  first  stage  (MAC) 
uses  a  cheap,  non-structural  matcher  to  quickly  filter  potentially  relevant  items  from  a  pool  of  such  items. 
These  potential  matches  are  then  processed  in  the  FAC  stage  by  a  more  powerful  (but  more  expensive) 
structural  matcher,  based  on  the  structure-mapping  notion  of  literal  similarity  (Gentner,  1983). 

We  begin  in  Section  2  by  briefly  reviewing  psychological  evidence  on  similarity-based  retrieval  and 
mapping,  thereby  extracting  some  criteria  which  retrieval  models  must  satisfy.  This  section  also  outlines 
the  computational  issues  raised  by  similarity-based  retrieval,  drawing  on  the  AI  literature  as  necessary. 
Section  3  describes  the  MAC/FAC  model,  showing  how  it  satisfies  the  psychological  and  computational 
desiderata.  4  illustrates  the  model's  psychological  plausibility  by  simulating  the  results  of  a  psychological 
experiment.  Section  5  explores  the  consequences  of  different  design  decisions  by  sensitivity  analyses  at  the 
level  of  algorithms,  demonstrating  that  the  model’s  performance  depends  on  the  theoretically  important 
parameters.  Section  6  compares  MAC/FAC  with  ARCS,  the  closest  competing  model  of  similarity-based 
retrieval,  demonstrating  that  MAC/FAC  performs  well  on  databases  designed  by  others  (e.g.,  the  ARCS 
datasets)  and  that  MAC/FAC’s  performance  fits  the  psychological  evidence  better  than  ARCS.  Finally, 
Section  7  compares  MAC/FAC  to  several  other  memory  models,  analyzes  some  of  its  limitations,  and 
discusses  possible  extensions. 


2.  Framework 

Similarity-based  transfer  can  be  decomposed  into  subprocesses.  Given  that  a  person  has  some  current 
target  situation  in  working  memory,  transfer  from  prior  knowledge  requires  at  least 


2 


1 .  Accessing  a  similar  (base)  situation  in  long-term  memory, 

2.  Creating  a  mapping  from  the  base  to  the  target,  and 

3.  Evaluating  the  mapping. 

In  this  case  the  base  is  an  item  from  memory,  and  the  target  is  the  probe;  that  is,  we  think  of  the  retrieved 
memory  items  as  mapped  to  the  probe.  Other  processes  may  also  occur  —  verifying  new  inferences  about 
the  target  (Clement,  1986),  elaborating  the  base  and  target  (Ross,  1987;  Falkenhainer,  1988),  adapting  or 
tweaking  the  domain  representations  to  improve  the  match  (Falkenhainer,  1990;  Holyoak  &  Novick,  in 
press;  Kass,  1986,  1989),  and  abstracting  the  common  structure  from  base  and  target  (Gick  &  Holyoak, 
1983;  Skorstad,  Gentner  &  Medin,  1988;  Winston,  1982)  --  but  our  focus  is  on  the  first  three  processes. 


2. 1  Structure-Mapping  and  the  typology  of  similarity 

The  process  of  mapping  aligns  two  representations  and  uses  this  alignment  to  generate  analogical 
inferences  (Gentner,  1983,  1988,  1989a).  Alignment  occurs  via  matching,  which  creates  correspondences 
between  items  in  the  two  representations.  Analogical  inferences  are  generated  by  using  the 
correspondences  to  import  knowledge  from  the  base  representation  into  the  target.  The  mapping  process  is 
assumed  to  be  governed  by  the  constraints  of  structural  consistency,  one-to-one  mapping  and  parallel 
connectivity.  One-to-one  mapping  means  that  an  interpretation  of  a  comparison  cannot  align  (e.g.,  place 
into  correspondence)  the  same  item  in  the  base  with  multiple  items  in  the  target,  or  vice-versa.  Parallel 
connectivity  means  that  if  an  interpretation  of  a  comparison  aligns  two  statements,  their  arguments  must 
also  be  placed  into  correspondence.1  In  this  account,  similarity  is  defined  in  terms  of  correspondences 
between  structured  representations  (Gentner  1983;  Gentner  &  Markman,  1993, 1994,  in  press;  Goldstone, 
Medin  &  Gentner,  1991;  Markman  &  Gentner,  1990,1993a,b;  Medin,  Goldstone  &  Gentner,  1993). 
Matches  can  be  distinguished  according  to  the  kinds  of  commonalities  present.  An  analogy  is  a  match 
based  on  a  common  system  of  relations,  especially  involving  higher-order  relations.2  A  literal  similarity 
match  includes  both  common  relational  structure  and  common  object  descriptions.  A  surface  similarity  or 
mere-appearance  match  is  based  primarily  on  common  object  descriptions,  with  perhaps  a  few  shared 
first-order  relations. 

There  is  considerable  evidence  that  the  mapping  process  is  sensitive  to  structural  commonalities.  People 
can  readily  align  two  situations,  preserving  structurally  important  commonalties,  making  the  appropriate 
lower-order  substitutions,  and  mapping  additional  predicates  into  the  target  as  candidate  inferences.  For 
example,  Clement  &  Gentner  (1991)  showed  people  analogies  and  asked  which  of  two  lower-order 
assertions,  both  shared  by  base  and  target,  was  most  important  to  the  match.  Subjects  chose  assertions  that 
were  connected  to  matching  causal  antecedents:  that  is,  their  choice  was  based  not  only  on  the  goodness  of 
the  local  match  but  on  whether  it  was  connected  to  a  larger  matching  system.  In  a  second  study,  subjects 
were  asked  to  make  a  new  prediction  about  the  target  based  on  the  analogy  with  the  base  story.  They  again 
showed  sensitivity  to  connectivity  and  systematicity  in  choosing  which  predicates  to  map  as  candidate 
inferences  from  base  to  target.  Evidence  for  structural  consistency  in  mapping  comes  from  a  study  by 
Spellman  and  Holyoak  (1992).  They  asked  people  to  explicate  the  analogy  between  the  Gulf  War  and 
World  War  n,  assuming  Saddam  Hussein  maps  onto  Hitler.  Although  people  were  divided  in  their 
mappings,  they  were  highly  consistent.  People  who  mapped  Bush  onto  Churchill  mapped  the  current  USA 
onto  WWH  Britain  and  people  who  mapped  Bush  onto  FDR  mapped  the  USA  today  onto  USA  during 
World  War  H. 


1  Previously  we  used  the  term  structurally  grounded  for  parallel  connectivity. 

2  We  define  the  order  of  an  item  in  a  representation  as  follows:  Objects  and  constants  are  order  0.  The 
order  of  a  statement  is  one  plus  the  maximum  of  the  order  of  its  arguments. 
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The  degree  of  relational  match  is  also  important  in  determining  people's  evaluations  of  comparisons. 

People  rate  metaphors  as  more  apt  when  they  are  based  on  relational  commonalities  than  when  they  are 
based  on  common  object-descriptions  (Gentner,  1988b;  Gentner  &  Clement,  1988).  Gentner,  Rattermann 
and  Forbus  (1993)  asked  subjects  to  rate  the  soundness  and  similarity  of  story  pairs  that  varied  in  which 
kinds  of  commonalities  they  shared.  Subjects'  soundness  and  similarity  ratings  were  substantially  greater 
for  pairs  that  shared  higher-order  relational  structure  than  for  those  that  did  not  (Gentner  &  Landers,  1985; 
Gentner,  Rattermann,  &  Forbus,  in  press;  Rattermann  &  Gentner,  1987).  Common  relational  structure 
also  contributes  strongly  to  judgments  of  perceptual  similarity  (Goldstone,  Medin  &  Gentner,  1991)  as  well 
as  to  the  way  in  which  people  align  pairs  of  pictures  in  a  mapping  task  (Markman  &  Gentner,  1990, 

1993b)  and  determine  common  and  distinctive  features  (Gentner  &  Markman,  1994;  Markman  &  Gentner, 
1993a). 

Any  model  of  human  similarity  and  analogy  must  capture  this  sensitivity  to  structural  commonality.  To  do 
so,  it  must  involve  structural  representations  and  processes  that  operate  to  align  them  (Bamden,  1994; 
Gentner  &  Markman,  in  press;  Goldstone,  Medin  &  Gentner,  1991;  Holyoak,  Novick,  &  Melz,  1994; 
Keane,  1988ab;  Markman  &  Gentner,  1993;  Medin,  Goldstone,  &  Gentner,  1993;  Reed,  1987;  Reeves  & 
Weisberg,  1994).  This  would  seem  to  require  abandoning  some  highly  influential  models  of  similarity:  e.g., 
modeling  similarity  as  the  intersection  of  independent  feature  sets  or  as  the  dot  product  of  feature  vectors. 
However,  we  show  below  that  a  variant  of  these  nonstructural  models  can  be  useful  in  describing  memory 
retrieval. 


2.1.1  Similarity-based  Access  from  Long-term  Memory 

There  is  considerable  evidence  that  access  to  long-term  memory  relies  more  on  surface  commonalities  and 
less  on  structural  commonalities  than  does  mapping.  For  example,  people  often  fail  to  access  potentially 
useful  analogs,  as  in  Gick  and  Holyoak's  (1980, 1983)  dramatic  demonstration.  When  subjects  were  told  a 
story  and  then  given  an  analogous  problem  to  solve,  about  30%  solved  the  problem.  However,  if  subjects 
were  simply  told  to  think  about  the  story  they  had  heard,  80%  —  90%  solved  the  problem.  We  can  infer 
that  most  of  the  subjects  retained  representations  of  the  prior  story  sufficient  to  provide  a  useful  analogy, 
but  that  hearing  the  structurally  analogous  problem  did  not  provide  spontaneous  access  to  the  story 
representation  in  memory.  Other  research  has  shown  that,  although  people  in  a  problem-solving  task  are 
often  reminded  of  prior  problems,  these  remindings  are  often  based  on  surface  similarity  rather  than  on 
structural  similarities  between  the  solution  principles  (Holyoak  &  Koh,  1987;  Keane,  1987, 1988b;  Novick 
1988;  Reed,  Ernst,  &  Banerji,  1974;  Ross,  1984,  1987,  1989;  see  also  the  comprehensive  review  by 
Reeves  &  Weisberg,  1994). 

The  experiments  we  will  model  here  investigated  which  kinds  of  similarities  led  to  the  best  retrieval  from 
long-term  memory  (Gentner  &  Landers,  1985;  Gentner,  Rattermann,  &  Forbus,  1993;  Rattermann  & 
Gentner,  1987).  Subjects  were  first  given  a  relatively  large  memory  set  (the  “Karla  the  hawk”  stories). 
About  a  week  later,  subjects  were  given  new  stories  that  resembled  the  original  stories  in  various  ways  and 
were  asked  to  write  out  any  remindings  they  experienced  to  the  prior  stories  while  reading  the  new  stories. 
Finally,  they  rated  all  the  pairs  for  soundness  —  i.e.,  how  well  inferences  could  be  carried  from  one  story  to 
the  other.  The  results  showed  a  marked  disassociation  between  retrieval  and  subjective  soundness  and 
similarity.  Surface  similarity  was  the  best  predictor  of  memory  access  and  structural  similarity  was  the 
best  predictor  of  subjective  soundness.  This  dissociation  held  not  only  between  subjects,  but  also  within 
subjects.  That  is,  subjects  given  the  soundness  task  immediately  after  the  cued  retrieval  task  judged  that 
the  very  matches  that  had  come  to  their  minds  most  easily  (the  surface  matches)  were  highly  unsound  (i.e., 
unlikely  to  be  useful  in  inference).  This  suggests  that  similarity-based  access  may  be  based  on  qualitatively 
distinct  processes  from  analogical  referencing. 
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It  is  not  the  case  that  higher-order  relations  contribute  nothing  to  retrieval.  Adding  higher-order  relations 
led  to  nonsignificantly  more  retrieval  in  two  studies  and  to  a  small  but  significant  benefit  in  the  third.  Other 
research  has  shown  positive  effects  of  higher-order  relational  matches  on  retrieval,  especially  in  cases 
where  subjects  have  been  brought  to  do  intensive  encoding  of  the  original  materials  (Faries  &  Reiser, 

1988)  or  are  expert  in  the  domain  (Novick,  1988a, b).  But  higher-order  commonalities  have  a  much  bigger 
effect  on  mapping  once  the  two  analogs  are  present  than  they  do  on  similarity-based  retrieval,  and  the 
reverse  is  true  for  surface  commonalities. 

These  results  place  several  constraints  on  a  computational  model  similarity-based  retrieval.  The  first  two 
criteria  ensure  that  the  model  can  provide  an  account  of  mapping  and  inference. 

Structured  representation  criterion:  The  model  must  be  able  to  store  structured  representations. 

Structured  mappings  criterion:  The  model  must  incorporate  processes  of  structural  mapping  (i.e., 
alignment  and  transfer)  over  its  representations. 


The  remaining  four  criteria  summarize  the  pattern  of  retrieval  results: 

Primacy  of  the  mundane  criterion:  The  majority  of  retrievals  should  be  literal  similarity  matches:  i.e., 
matches  high  in  both  structural  and  surface  commonlaties. 

Surface  superiority  criterion:  Retrievals  based  on  surface  similarity  are  frequent. 

Rare  insights  criterion:  Relational  remindings  must  occur  at  least  occasionally,  with  lower  frequency  than 
literal  similarity  or  surface  remindings. 

Scalability  criterion:  The  model  must  be  plausibly  capable  of  being  extended  to  large  memory  sizes. 


No  current  model  of  transfer  succeeds  in  satisfying  all  six  criteria.  There  are  two  major  approaches  to 
memory  models:  Indexing  models,  commonly  used  in  case-based  reasoning  work,  and  feature- vector 
models,  commonly  used  in  mathematical  modeling  of  human  memory.  We  examine  the  trade-offs  of  each  in 
turn. 

Most  case-based  reasoning  models  (Kass,  1986,  1989;  Kolodner,  1984,  1988,  1989,  1993;  Schank,  1982) 
use  structured  representations,  and  focus  on  the  process  of  adapting  and  applying  old  cases  to  new 
situations.  Such  models  satisfy  the  structured  representation  and  structured  mappings  criteria.  However, 
such  models  also  typically  presume  a  highly  indexed  memory  in  which  the  vocabulary  used  for  indexing 
captures  significant  higher-order  abstractions  such  as  themes  and  principles.  Viewed  as  psychological 
accounts,  these  models  would  predict  that  people  should  typically  access  the  best  structural  match.  That 
prediction  fails  to  match  the  pattern  of  psychological  results  summarized  by  the  primacy  of  the  mundane 
and  surface  superiority  criteria.  Scalability  is  also  an  open  question  at  this  time,  since  no  one  has  yet 
accumulated  and  indexed  a  large  (1,000  to  106)  corpus  of  structured  representations. 

The  reverse  set  of  advantages  and  disadvantages  holds  for  approaches  that  model  similarity  as  the  result  of 
a  dot  product  (or  some  other  simple  operation)  over  feature  vectors,  as  in  many  mathematical  models  of 
human  memory  (e.g.,  Gillund  &  Shiffrin,  1984;  Hintzman,  1986,  1988;  Medin  &  Schaffer,  1978;  but  see 
Murphy  &  Medin,  1985)  as  well  as  in  many  connectionist  models  of  learning  (e.g.,  Smolensky,  1988,  see 
also  the  review  by  Humphreys,  Bain  &  Pike  (1989)  and  Ratcliff  (1990).).  These  models  typically  utlize 
nonstructured  knowledge  representations  and  relatively  simple  match  processes  and  hence  do  not  allow  for 
structural  matching  and  inference.  Such  models  also  tend  to  utilize  a  unitary  notion  of  similarity,  an 
assumption  that  is  called  into  question  by  the  disassociation  described  above  (see  also  Medin,  Goldstone  & 
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Gentner,  1993;  Markman  &  Gentner,  1993).  However,  the  use  of  feature- vectors  has  some  advantages  for 
modeling  access  to  long-term  memory.  The  computations  are  simple  enough  to  make  it  feasible  to  compute 
many  matches  and  choose  the  best,  thus  satisfying  the  scalability  criterion.  Further,  because  object 
features  are  included  in  the  feature  vectors,  these  models  should  be  able  to  capture  the  surface  superiority 
criterion  and  in  many  cases  the  primacy  of  the  mundane  criterion.  (Failures  on  the  latter  will  occur  for 
cross-mappings,  when  the  objects  and  relations  match  but  their  bindings  do  not.)  It  should  be  noted  that 
some  case-based  reasoning  work  also  restricts  itself  to  feature-vector  representations  and  thus  has  the  same 
strengths  and  weaknesses  (e.g.,  Stanfill  &  Waltz,  1986). 

The  MAC/FAC  model  seeks  to  combine  the  advantages  of  both  approaches.  We  turn  now  to  its 
description. 


3.  The  MAC/FAC  model 

The  complexity  of  the  phenomena  in  similarity-based  access  suggests  a  two-stage  model.  Consider  the 
computational  constraints  on  access.  The  large  number  of  cases  in  memory  and  the  speed  of  human  access 
suggests  a  computationally  cheap  process.  But  the  requirement  of  judging  soundness,  essential  to 
establishing  whether  a  match  can  yield  useful  results,  suggests  an  expensive  match  process.  A  common 
computational  solution  to  such  problems  is  to  use  a  two-stage  process,  in  which  a  cheap  filter  is  used  to 
pick  out  a  subset  of  likely  candidates  for  more  expensive  processing  (c.f.  Bareiss  &  King,  1989). 
MAC/FAC  uses  this  strategy.  The  disassociation  noted  previously  can  be  understood  in  terms  of  the 
interactions  of  its  two  stages. 

Figure  1  illustrates  the  components  of  the  MAC/FAC  model.  The  inputs  are  a  pool  of  memory  items  and  a 
probe,  i.e.,  a  description  for  which  a  match  is  to  be  found.  The  output  is  an  item  from  memory  (i.e.,  a 
structured  description)  and  a  comparison  of  this  item  with  the  probe.  (Section  3.1  below  describes  exactly 
what  a  comparison  is.)  Internally  there  are  two  stages.  The  MAC  stage  provides  a  cheap  but  non- 
structural  filter,  which  only  passes  on  a  handful  of  items.  The  FAC  stage  uses  a  more  expensive  but  more 
accurate  structural  match,  to  select  the  most  similar  item(s)  from  the  MAC  output  and  producing  a  full 
structural  alignment.  Each  stage  consists  of  matchers,  which  are  applied  to  every  input  description,  and  a 
selector,  which  uses  the  evaluation  of  the  matchers  to  select  which  comparisons  are  produced  as  the  output 
of  that  stage.  Conceptually,  matchers  are  applied  in  parallel  within  each  stage. 

We  make  minimal  assumptions  concerning  the  global  structure  of  long-term  memory.  We  assume  here 
only  that  there  is  a  large  pool  of  descriptions  from  which  we  must  select  one  or  a  few  that  are  most  similar 
to  a  probe.  We  are  uncommitted  as  to  whether  the  pool  is  the  whole  of  long-term  memory  or  a  subset 
selected  via  some  other  method,  e.g.,  spreading  activation. 
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Figure  1:  The  MAC/FAC  model 


We  begin  by  describing  the  FAC  stage.  In  doing  so,  we  also  describe  the  computational  framework  which 
underlies  MAC  and  FAC,  including  our  conventions  for  representation  and  the  information  about  the  SME 
algorithm  that  is  required  to  fully  understand  MAC/FAC. 

3.1  The  FAC  stage  and  SME 

The  FAC  stage  takes  as  input  the  descriptions  selected  by  the  MAC  stage,  and  computes  a  full  structural 
match  between  each  item  and  the  probe.  We  model  the  FAC  stage  by  using  SME,  the  Structure-Mapping 
Engine  (Falkenhainer,  Forbus  &  Gentner,  1986,  1989).  Here  we  briefly  summarize  SME's  operation,  both 
by  way  of  describing  the  FAC  stage  and  to  provide  the  vocabulary  needed  to  motivate  the  design  of  the 
MAC  stage. 

SME  is  an  analogical  matcher  designed  as  a  simulation  of  structure-mapping  theory.  It  takes  two  inputs,  a 
base  description  and  a  target  description.  (For  simplicity  we  speak  of  these  descriptions  as  being  made  up 
of  items,  meaning  both  objects  and  statements  about  these  objects.)  It  computes  a  set  of  global 
interpretations  of  the  comparison  between  base  and  target.  Each  global  interpretation  includes 

•  A  set  of  correspondences  which  pair  specific  items  in  the  base  representation  to  specific  items  in  the 
target. 

•  A  structural  evaluation  reflecting  the  estimated  soundness  of  the  match.  In  subsequent  processing,  the 
structural  evaluation  provides  one  source  of  information  about  how  seriously  to  take  the  match. 

•  A  set  of  candidate  inferences,  potential  new  knowledge  about  the  target  which  are  suggested  by  the 
correspondences  between  the  base  and  target.  Candidate  inferences  are  what  give  analogy  its 
generative  power,  since  they  represent  the  importation  of  new  knowledge  into  the  target  description. 
However,  they  are  only  conjectures;  they  must  be  tested  and  evaluated  by  other  means. 


We  can  illustrate  these  ideas  with  the  Rutherford  analogy,  which  describes  the  structure  of  the  atom  in 
terms  of  that  of  the  solar  system.  The  solar  system  is  the  base  description  and  the  atom  is  the  target 
description. 

•  Rutherford  paired  the  Sun  to  the  nucleus  and  the  planets  to  the  electrons.  These  correspondences  seem 
reasonable  not  because  of  intrinsic  object  similarities  but  because  they  allow  various  relational 
statements  also  to  be  placed  in  correspondence  (i.e.,  aligned ):  e.g.,  the  relative  masses  of  the  objects 
and  the  fact  that  the  planets/electrons  revolve  around  the  Sun/nucleus. 

•  This  interpretation  is  a  selection  from  among  many  common  relations.  It  focuses  on  the  causal  system 
of  a  central  gravitational/electromagentic  force,  the  relative  mass  of  the  two  bodies  within  each  system, 
and  the  fact  that  the  less  massive  body  revolves  around  the  heavier  body.  Other  common  relations  — 
such  as  the  relative  temperatures  or  differences  in  color  of  the  two  objects  —  that  do  not  belong  to  a 
common  connected  system  are  not  included  in  the  interpretation.  We  refer  to  this  preference  for 
connected  systems  of  common  predicates  as  the  systematicity  principle. 

•  The  preferred  interpretation  might  also  sanction  new  conjectures  about  the  atom,  such  as  that  the  cause 
of  the  electrons  revolving  around  the  nucleus  is  the  existence  of  an  attractive  force.3 

The  interpretations  produced  by  SME  are  structurally  consistent,  in  that  they  satisfy  the  constraints  of  one- 
to-one  mapping  and  parallel  connectivity,  as  defined  in  Section  2.1.  These  constraints  are  important 
because  they  allow  for  the  generation  of  coherent  candidate  inferences.  The  systematicity  constraint  is 
important  because  it  captures  the  human  preference  for  aligning  connected  systems  of  predicates  (e.g., 
logical  arguments  or  causal  sequences).  In  addition,  SME  attempts  to  find  maximal  interpretations.  An 
interpretation  is  maximal  if  adding  any  additional  correspondences  would  render  it  structurally  inconsistent. 
Maximally  is  important  both  because  it  reduces  the  number  of  possible  interpretations  and  because  it 
ensures  that  the  full  structural  implications  of  a  set  of  correspondences  will  be  considered. 

Before  describing  the  SME  algorithm  further,  some  conventions  concerning  representation  are  in  order. 

We  use  infix  notation  or  Lisp  prefix  syntax  for  statements  as  appropriate.  We  use  the  term  functor  of  a 
statement  as  a  general  term  for  the  relation  or  function  or  connective  that  takes  the  remaining  parts  of  the 
statement  as  its  arguments.  For  example, 

1 .  In  GREATER-THAN  (HEIGHT  (A)  ,  HEIGHT  ( B )  ) ,  the  functor  is  GREATER-THAN. 

2.  m  not  (above  (B,  A) ) ,  the  functor  is  NOT. 

3.  in  height  (A) ,  the  functor  is  HEIGHT. 

4.  in  red  (A),  the  functor  is  red. 


Example  #  1  is  an  example  of  a  relation.  Relations  range  over  truth  values,  and  their  arguments  can  be 
entities  or  other  statements.  Relations  always  have  multiple  arguments,  with  the  exception  of  logical 
connectives  (e.g.,  Example  #  2),  which  are  always  treated  as  relations  regardless  of  the  number  of 
arguments.  For  the  purposes  of  structure-mapping,  modal  operators  and  other  higher-order  predicates  are 
classifed  as  relations.  Example  #  3  is  an  example  of  a  function,  which  maps  one  or  more  entities  into 
another  entity  or  constant.  In  our  psychological  modeling,  functions  are  often  used  to  represent  known 
dimensions  or  components  of  structured  objects  (e.g.,  height,  pressure,  or  color).  Example  #  4  is  an 


3  Incorrect  candidate  inferences  are  also  possible  —  e.g.  that  the  attractive  force  in  the  atom  is  gravity. 
What  counts  as  a  candidate  inference  versus  as  alignable  (or  non-alignable)  structure  depends  on  the 
reasoner's  state  of  knowledge  about  the  target. 
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example  of  an  attribute ,  an  atomic  description  of  some  property  of  an  entity.  Attributes  take  only  one 
argument,  to  capture  the  notion  of  a  unitary  description.  This  of  course  does  not  mean  that  attributes 
cannot  be  decomposed.  For  instance,  the  following  forms  are  logically  equivalent: 

•  RED (A) 

•  COLOR-OF (A,  red) 

•  COLOR (A)  =  red 

However,  we  use  these  three  distinct  forms  to  represent  distinct  psychological  constructs.  Roughly,  the 
first,  an  attribute,  indicates  that  the  subject  thinks  of  redness  as  a  quality  of  the  object.  The  second,  a 
relation,  indicates  that  the  subject  has  to  some  degree  disengaged  redness  from  the  object  and  sees  color  as 
a  relationship  between  an  object  and  a  set  of  possible  values.  The  third,  a  function,  indicates  that  the 
subject  conceives  of  color  as  a  dimension  of  general  application  and  thinks  of  the  color  of  A  as  a  value 
along  this  dimension.  We  view  this  kind  of  dimensional  representation  as  important  because  dimensions 
may  in  the  process  of  comparison  be  aligned  with  quite  different  dimensions  (e.g.,  HEIGHT  and 
DARKNESS).  Thus  qualities  that  are  conceived  of  as  dimensions  are  more  likely  to  participate  in 
systematic  cross-dimensional  matches.  (For  the  implications  of  this  idea  in  analogical  development,  see 
Gentner  &  Rattermann,  1990;  Gentner,  Rattermann,  Markman  &  Kotovsky,  in  press;  Kotovsky  & 

Gentner,  1988.) 

With  these  conventions  in  mind,  let  us  turn  to  the  SME  algorithm.  SME  operates  via  a  local-to-global 
process.  Conceptually,  its  operation  can  be  divided  into  four  phases.  The  first  phase  constructs  a  network 
of  local  matches  between  items  in  the  base  and  target.  The  second  phase  constructs  global  interpretations 
by  coalescing  structurally  consistent  combinations  of  local  matches.  The  third  phase  computes  the 
structural  evaluation,  and  the  fourth  phase  computes  candidate  inferences  for  each  interpretation.  We 
examine  each  in  turn. 

SME  begins  by  finding  all  possible  local  matches  between  statements  in  the  base  and  statements  in  the 
target.  A  local  match  is  created  between  base  item  Bj  and  target  item  Tj  when  either: 

1.  Bj  and  Tj  are  both  statements  whose  functors  are  sufficiently  alike  (typically  identical,  but  see  below). 

2.  Bj  and  Tj  are  corresponding  arguments  of  other  statements  which  are  connected  by  a  local  match  and 
are  both  either  objects  or  functions. 

For  instance,  given  the  base  item  B1  and  the  target  item  T1  defined  as: 

Bl:  (CAUSE  Event 17  Event31) 

Tl:  (CAUSE  Event 5  Event 63) 

a  match  would  be  hypothesized  between  bi  and  ti  because  their  functors  (i.e.,  cause)  are  identical.  This 
local  match  suggests  in  turn  hypothesizing  that  Eventi7  and  Events  match,  and  also  that  Event3i  and 
Event63  match.  Each  suggested  match  leads  to  the  creation  of  new  local  matches  involving  the  arguments  of 
the  statement  if  either  (a)  both  are  entities  (e.g.,  objects  or  constants),  (b)  both  are  terms  involving 
functions,  which  are  an  indirect  means  of  referring  to  entities  or  dimensions,  or  (c)  both  are  expressions 
whose  functors  match.  Here  is  an  example  of  substitution  involving  functions: 

B2:  (PRESSURE  Water32 ) 

T2:  (TEMPERATURE  Brick45) 
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b2  and  t2  could  be  placed  into  correspondence  if  they  were  the  arguments  of  some  other  matching  pair  of 
statements  since  pressure  and  temperature  are  both  functions  (in  this  case  referring  to  values  on  physical 
dimensions  of  the  respective  objects). 

The  idea  that  two  statements  can  match  only  if  their  relational  predicates  are  “sufficiently  alike”  is  based  on 
the  claim  that  some  common  relational  content  is  required  in  analogy.  We  disagree  with  Holy  oak  and 
Thagard's  (1989)  claim  that  pure  structural  isomorphisms  can  qualify  as  analogies.  They  present  the 
following  pair: 

Bill  is  smart  and  tall.  Rover  is  hungry  and  friendly. 

Steve  is  smart.  Fido  is  hungry. 

Tom  is  timid  and  tall.  Blackie  is  frisky  and  friendly. 


Holyoak  and  Thagard  (1989,  p.  343)  note  that  ACME  (and  five  out  of  the  eight  subjects  tested)  could 
match  this  pair  and  agree  on  the  best  attribute  correspondences.  But  the  fact  that  it  can  be  solved  is  not 
decisive:  we  would  suggest  that  it  is  taken  as  a  logical  puzzle  to  be  solved  for  the  best  correspondences,  not 
as  an  analogy.  The  trouble  with  accepting  pure  graph  matches  is  that  it  leads  to  the  claim  that  pairs  like  (1) 
and  (2)  below  are  analogies,  which  seems  patently  untrue: 


(1)  Fred  loves  New  York.  (2)  General  Motors  sells  cars. 

Note  that  it  is  the  relational  meaning  that  must  be  shared;  (2)  and  (3)  form  an  analogy  but  (2)  and  (4)  do 
not: 

(3)  Fred  peddles  popsicles.  (4)  General  Motors  heads  the  list. 


The  question  is  how  to  formalize  this  requirement  of  common  relational  content.  Structure-mapping  uses 
the  idea  of  tiered  identically.  The  default  criterion  for  “sufficiently  alike”  for  predicates  other  than 
functions  is  that  the  predicates  are  identical.  We  call  this  the  simple  identicality  criterion.  Simple 
identicality  of  conceptual  relations  is  an  excellent  first-pass  criterion  because  it  is  computationally  cheap. 
The  notion  of  smiple  identicality  might  suggest  an  inability  to  process  any  matches  other  than  literal 
matches.  This  is  not  the  case.  First,  we  assume  that  input  representations  are  canonical  conceptual 
representations,  not  semi- verbal  strings.  Second,  functions,  which  represent  domain  dimensions,  can  be 
matched  non-identically  if  they  are  embedded  in  matching  relational  structure.  This  ability  to  align  non¬ 
identical  functions  provides  considerable  flexibility.  This  is  what  allows  SME  to  make  cross-dimensional 
matches,  as  when  we  interpret  “Sally  is  sharper  than  Bill”  to  mean  that  Sally  is  smarter  than  Bill. 
However,  there  are  circumstances  where  criteria  requiring  more  processing  are  worthwhile  (e.g.,  when 
placing  two  items  in  correspondence  would  allow  a  larger,  or  very  relevant,  structure  to  be  mapped,  as  in 
Falkenhainer’s  (1987, 1990)  work).  In  these  circumstances  weaker  criteria  (in  that  they  allow  more  items 
to  match)  that  involve  more  processing  are  allowed.  One  such  test  is  minimal  ascension  (Falkenhainer, 
1987, 1990)  which  allows  two  items  to  be  placed  into  correspondence  if  their  predicates  have  close 
common  superordinates.  Another  technique  is  decomposition:  Two  concepts  that  are  similar  but  not 
identical  (such  as  “bestow”  and  “bequeath”)  are  decomposed  into  a  canonical  representation  language  so 
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that  their  similarity  is  expressed  as  a  partial  identity  (here,  roughly  “give”).  Decomposition  is  the  simplest 
form  of  re-representation  (Gentner,  1989;  Gentner  &  Rattermann,  1991),  where  additional  knowledge  is 
used  to  reformulate  a  description  in  order  to  achieve  a  better  match.  In  this  paper  we  only  use  SME  with 
the  first-level  identically  constraint.  As  Section  6  argues,  this  simple  constraint  seems  to  provide  a  better 
psychological  account  than  more  complex  constraints  do. 

The  process  of  using  matches  to  propose  lower  matches  is  recursive,  ending  with  entity  matches.  SME 
does  not  try  matches  between  every  pair  of  objects  in  base  and  target:  It  only  hypothesizes  object  matches 
when  there  is  some  aspect  of  the  relational  structure  which  suggests  that  the  objects  might  correspond. 

This  leads  to  substantial  efficiencies  over  purely  bottom-up  matchers,  such  as  (Winston  1980). 

The  output  of  the  first  phase  is  a  network  of  match  hypotheses,  each  representing  a  local  match  between  an 
item  of  the  base  and  target.  At  this  stage  the  network  is  incoherent.  The  set  of  correspondences  taken  as  a 
whole  is  structurally  inconsistent,  often  including  N-to-one  mappings.  Further,  this  initial  network  may 
contain  match  hypotheses  that  are  not  grounded  and  so  can  never  be  part  of  any  global  interpretation.  A 
match  hypothesis  is  grounded  if  a  recursive  chain  of  correspondences  from  it  through  its  arguments  exists 
all  the  way  down  to  entities.  Only  grounded  match  hypotheses  can  participate  in  global  interpretations. 
Otherwise,  global  interpretations  might  include  statements  whose  arguments  did  not  match,  which  would 
violate  the  parallel  connectivity  constraint. 

Looking  at  a  simple  example  makes  this  process  clearer.  Figure  2  shows  two  drawings  used  in 
psychological  experiments  concerning  analogy,4  with  a  propositional  representation  of  these  pictures 
suitable  for  simulation  shown  in  Figure  3.  The  right  hand  side  of  Figure  3  shows  the  propositions  in 
standard  logical  format,  while  the  left  hand  side  contains  an  equivalent  graphical  representation  which  is 
useful  for  understanding  the  match  process.  Figure  4  illustrates  the  match  hypotheses  computed  by  SME 
for  these  descriptions. 


4  We  thank  Arthur  Markman  for  the  drawings  of 


Figure  2  and  the  corresponding  representations. 
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Figure  2:  Two  simple  situations 


Even  though  the  initial  network  of  match  hypothesis  is  structurally  inconsistent,  it  contains  every  consistent 
interpretation  of  the  match;  global  interpretations  emerge  out  of  the  initial  network.  Thus  the  maximum  size 
of  any  global  interpretation,  as  measured  in  number  of  correspondences,  is  limited  by  the  size  of  this 
network.  We  exploit  this  fact  in  Section  3.3. 


typlng-at 


f^°rJ  I  Qjp]  Jcarfrlj  jjactoTootei 


Jack's  Robot  Repair  description: 

(REASON  (REPAIRING  ROBOTJ  CAR54) 
(BROKEN  CAR54)) 

(CAUSE  (TYPING-AT  JACK 

(KEYBOARD  ROBOTJ)) 
(REPAIRING  ROBOTJ  CAR54)) 
(CAUSE  (REPAIRING  ROBOTJ  CAR54) 
(USING  ROBOTJ  HANDTOOLSJ)) 
(DOOR  DOORJ) 

(JOINTED  ROBOTJ) 

(METALLIC  ROBOTJ) 

(ROBOT  ROBOTJ) 

(PERSON  JACK) 


Grant's  Robot  Repair  description: 

(REASON  (REPAIRING  GRANT  ROBOTG) 
(BROKEN  ROBOTG)) 

(CAUSE  (REPAIRING  GRANT  ROBOTG) 
(USING  GRANT  HANDTOOLSG)) 
(TOOL-SET  HANDTOOLSG) 

(DOOR  DOORG) 

(JOINTED  ROBOTG) 

(METALLIC  ROBOTG) 

(ROBOT  ROBOTG) 

(PERSON  GRANT) 


Here  are  two  predicate  calculus  descriptions  given  to  SME  to  illustrate  the  algorithm’s  operation. 

Figure  3:  Sample  Descriptions 


In  the  second  phase,  these  local  matches  are  coalesced  into  global  interpretations.  The  SME  algorithm 
combines  structurally  consistent  combinations  of  match  hypotheses  (i.e.,  sets  with  consistent  object 
bindings  and  consistent  relational  argument  assignments).  For  instance,  in  Figure  4  there  are  two  match 
hypotheses  involving  Grant,  one  which  places  him  in  correspondence  with  jack  because  person  is  true  of 
both  of  them,  and  another  match  hypothesis  which  places  Grant  in  correspondence  with  Robotj,  because 
both  are  agents  of  the  same  kind  of  action,  repairing.  No  interpretation  of  this  comparison  can  include 
both  of  these  match  hypotheses.  Merging  can  be  done  exhaustively,  producing  all  possible  interpretations 
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(as  in  Falkenhainer,  Forbus  &  Gentner  1986,  1989);  however,  we  normally  use  a  more  psychologically 
plausible  greedy  merge  algorithm,  which  produces  only  one  or  two  interpretations  and  operates  in  linear 
time  (Forbus  &  Oblinger  1990;  Forbus,  Ferguson,  &  Gentner,  1994). 


This  picture  illustrates  the  match  hypotheses  generated  for  a  pair  of  simple  descriptions.  Match  hypotheses 
are  shown  as  triangles.  Dashed  lines  indicate  the  base  and  target  items  each  match  hypothesis  places  in 
correspondence.  The  solid  arrows  leaving  a  match  hypothesis  indicates  what  others  it  relies  upon  to  be 
structurally  consistent.  Notice  that  the  one  of  the  match  hypotheses  involving  the  occurrence  of  CAUSE  in  the 
target  is  structurally  inconsistent,  because  its  arguments  cannot  be  aligned. 

Figure  4:  A  match  hypothesis  forest 


The  third  phase  is  structural  evaluation.  For  simplicity,  we  describe  this  stage  as  conceptually  distinct 
from  the  previous  stage,  although  it  is  actually  interleaved  with  building  interpretations,  since  its  results 
guide  the  greedy  merge  algorithm.  To  capture  human  preferences,  the  structural  evaluation  computation 
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should  favor  interpretations  with  many  matches  over  those  with  few  matches  and  deep  interpretations  over 
shallow  interpretations.  The  first  step  is  to  assign  an  initial  score  to  every  match  hypothesis.  This  helps 
enforce  the  size  preference.  The  systematicity  preference  is  implemented  via  a  trickle-down  method:  match 
hypothesis  scores  are  passed  down  to  increment  the  scores  of  matching  arguments.5  That  is,  if  W  (MHX )  is 
the  score  associated  with  a  match  hypothesis  MHif  MH2  is  a  match  hypothesis  that  applies  to  one  of  MHi's 
arguments,  and  5  is  the  trickle-down  factor,  then  W  (MH2 )  is  incremented  as  follows: 

W(MH2)  <-max{W(MH2)  +  5W(MHi)  ;  1.0} 

This  local  computation  causes  scores  to  cascade  downwards,  providing  higher  values  to  those  object 
correspondences  which  support  the  alignment  of  large  relational  structures..  The  structural  evaluation  of  a 
global  interpretation  is  simply  the  sum  of  the  scores  of  the  match  hypotheses  which  comprise  its 
correspondences. 


GM1 :  10  correspondences,  SES  =  4.66 
Object  mappings: 

DOORG  <->  DOORJ 
ROBOTG  <->  CARS 4 
GRANT  <->  ROBOT J 
HANDTOOLSG  <->  HANDTOOLSJ 
No  candidate  inferences . 

Here  is  a  summary  of  the  best  interpretation  for  this  match  found  by  SME.  SES  refers  to  the  structural 
evaluation  of  the  interpretation. 

Figure  5:  Global  interpretations  for  the  example 


The  final  phase  is  the  computation  of  candidate  inferences.  Computing  candidate  inferences  requires 
knowing  the  set  of  correspondences,  so  this  takes  place  after  the  merge  operation.  Candidate  inferences  are 
generated  by  finding  non-corresponding  relational  structure  in  the  base  which  can  be  conjectured  to  hold  in 
the  target.  The  global  interpretations  built  for  the  comparison  of  Figure  3  are  shown  in  Figure  5.  In  this 
simple  example  there  are  no  candidate  inferences. 

It  is  important  to  note  that  the  literal  similarity  computation  can  produce  purely  relational  interpretations  as 
well  as  overall  similarity  interpretations,  and  that  it  can  also  produce  purely  surface  interpretations  as  well. 
It  is  simply  a  question  of  which  collection  of  local  matches  win.  This  reflects  the  human  ability  to  process 
a  novel  comparison  and  discover  only  after  the  fact  that  it  is  an  analogy.  We  assume  that  this  all-purpose 
literal  similarity  mode  is  the  normal  mode  of  similarity  processing  in  the  absence  of  specific  instructions. 
Consequently,  SME  creates  initial  local  matches  for  attribute  statements  as  well  as  for  relational 
statements. 


5  The  systematicity  preference  could  have  been  implemented  by  differentially  weighting  matches  at  different 
levels.  This  method  would  seem  to  require  a  computationally  implausible  'bird’s  eye'  view  of  the 
representations.  In  a  comparison  of  the  two  methods,  the  trickle-down  method  accounted  for  human 
soundness  ratings  better  than  treating  weights  directly  as  a  function  of  order  (Forbus  &  Gentner,  1989). 
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For  SME  to  play  a  major  role  in  a  model  of  similarity-based  retrieval,  it  should  be  consistent  with 
psychological  evidence.  We  have  tested  the  psychological  validity  of  SME  as  a  simulation  of  analogical 
processing  in  several  ways.  For  instance,  we  compared  SME's  structural  evaluation  scores  with  human 
soundness  ratings  for  the  Karla  the  hawk  stories  discussed  below  (Gentner  &  Landers,  1985;  Rattermann 
&  Gentner,  1987)  Like  human  subjects,  SME  rated  analogical  matches  higher  than  surface  matches 
(Skorstad,  Falkenhainer,  and  Gentner,  1987).  The  patterns  of  preference  were  similar  across  story  sets: 
there  was  a  significant  positive  correlation  between  the  difference  scores  for  SME  and  those  for  human 
subjects,  where  the  difference  score  is  the  rating  for  analogy  minus  the  rating  for  surface  match  within  a 
given  story  set  (Gentner,  Rattermann  &  Forbus,  1993). 

Since  retrievals  occur  frequently,  components  in  model  of  retrieval  must  be  efficient.  SME  is  quite 
efficient.  The  generation  of  match  hypotheses  is  0(n2)  on  a  serial  machine,  where  n  is  the  number  of  items 
in  base  or  target,  and  should  typically  be  better  than  0(log(n))  on  data-parallel  machines.6  The  generation 
of  global  interpretations  is  roughly  0(log(n))  on  a  serial  machine,  using  the  greedy  merge  algorithm  of 
Forbus  &  Oblinger  (1990),7  and  even  faster  parallel  merge  algorithms  seem  feasible. 


3.2  The  FAC  stage 

The  FAC  stage  is  essentially  a  bank  of  SME  matchers,  all  running  in  parallel  in  literal  similarity  mode.8 
These  take  as  input  the  memory  descriptions  that  are  passed  forward  by  the  MAC  stage  and  compute  a 
structural  alignment  between  each  of  these  descriptions  and  the  probe.  The  other  component  of  the  FAC 
stage  is  a  selector  —  currently  a  numerical  threshold  —  which  chooses  some  subset  of  these  comparisons  to 
be  available  as  the  output  of  the  retrieval  system.  (See  Figure  1). 

The  FAC  stage  acts  as  a  structural  filter.  It  captures  the  human  sensitivity  to  structural  alignment  and 
inferential  potential  (subject  to  the  limited  and  possibly  surface-heavy  set  of  candidates  provided  by  the 
MAC  stage,  as  described  below).  Several  remarks  on  this  algorithm's  role  in  retrieval  are  in  order.  We  use 
the  literal  similarity  algorithm,  on  the  grounds  that  in  reminding  situations  people  can  respond  to  and 
identify  different  kinds  of  similarity.  (Recall  that  the  literal  similarity  computation  can  compute  relational 
similarity  or  object  similarity  as  well  as  overall  similarity).  This  choice  seems  ecologically  sound  because 
mundane  matches  are  often  reasonable  guides  to  action;  riding  a  new  bicycle,  for  instance,  is  like  riding 
other  bicycles  (Forbus  &  Gentner,  1986;  Gentner,  1989;  Medin  &  Ortony,  1989;  Medin  &  Ross,  1989). 
Finally,  this  choice  is  necessary  to  model  the  high  observed  frequency  of  surface  remindings.  These  surface 
remindings  would  mostly  be  rejected  if  FAC  were  strictly  an  analogy  matcher.  The  selector  for  the  FAC 
stage  must  choose  a  small  set  of  matches  for  subsequent  processing.  Currently  we  select  as  output  the  best 
match,  based  on  its  structural  evaluation,  and  any  others  within  10%  of  it.  We  settled  on  the  10%  criteria 
because  it  generally  returns  a  single  result,  only  producing  multiple  results  when  there  are  two  extremely 
close  candidates.  However,  other  criteria  are  possible  and  we  have  experimented  with  broadening  the 
percentage,  selecting  a  fixed  number,  selecting  a  maximum  number  (if  capacity  limits  were  assumed)  and 
so  forth.  (One  class  of  these  experiments  is  described  in  Section  5.)  We  have  also  considered  adding  a 
threshold  to  the  selector,  so  that  if  the  best  outcome  is  too  weak  the  retrieval  system  returns  nothing. 

6  The  worst  case  parallel  time  would  be  O(n),  in  degenerate  cases  where  all  but  one  of  the  local  matches  is 
proposed  by  matching  arguments. 

7  The  original  exhaustive  merge  algorithm  was  worst  case  factorial  in  the  number  of  “clumps”  of  match 
hypotheses,  but  in  practice  was  often  quite  efficient.  See  Falkenhainer,  Forbus,  &  Gentner  (1989)  for 
details. 

8  In  our  current  implementation  SME  is  run  sequentially  on  each  candidate  item  in  turn,  but  this  is  an 
artifact  of  the  implementation. 
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3.3  The  MAC  stage 

The  MAC  stage  collects  the  initial  set  of  matches  between  the  probe  and  memory.  Like  the  FAC  stage,  the 
MAC  stage  conceptually  consists  of  a  set  of  matchers  and  a  selector  that  simply  returns  all  items  whose 
MAC  score  is  within  10  %  of  the  best  score  given  that  probe.  The  challenge  of  the  MAC  stage  is  in  the 
design  of  its  matcher.  It  must  allow  quickly  comparing,  in  parallel,  the  probe  to  a  large  pool  of  descriptions 
and  passing  only  a  few  on  to  the  more  expensive  FAC  stage.  The  rest  of  this  section  describes  the  design 
and  implementation  of  the  MAC  matcher. 

Let  us  start  by  examining  in  more  detail  the  design  criteria  the  MAC  matcher  must  satisfy.  Ideally,  we 
would  like  the  most  similar  or  apt  memory  item  for  the  given  probe.  Clearly  running  SME  on  the  probe 
and  every  item  in  memory  would  provide  the  most  accurate  result.  Unfortunately,  even  though  SME  is  very 
efficient,  it  isn't  efficient  enough.  SME  operates  by  building  intermediate  structure,  in  the  form  of  the 
network  of  local  matches.  The  idea  of  building  such  networks  for  a  pair  of  items,  or  a  small  number  of 
pairs  of  items,  is  psychologically  plausible,  because  the  size  of  the  match  hypothesis  network  is  polynomial 
in  the  size  of  the  descriptions  being  matched.  This  means,  depending  on  one's  implementation  assumptions, 
that  a  fixed-size  piece  of  hardware  could  be  built  which  could  be  dynamically  reconfigured  to  represent  any 
local  match  network  for  input  descriptions  of  some  bounded  size.  What  is  not  plausible  is  that  such 
networks  could  be  built  between  a  probe  and  every  item  in  a  large  memory  pool,  and  especially  that  this 
could  happen  quickly  enough  in  neural  architectures  to  account  for  observed  retrieval  times. 

This  architectural  argument  suggests  that,  while  SME  in  literal  similarity  mode  is  fine  for  FAC,  MAC  must 
be  made  of  simpler  stuff.  To  escape  having  to  suffer  the  complexity  of  the  most  accurate  matcher  in  the 
“innermost  loop”  of  retrieval,  we  must  trade  accuracy  for  efficiency.  The  MAC  matcher  must  provide  a 
crude,  computationally  cheap  match  process  to  pare  down  the  vast  set  of  memory  items  into  a  small  set  of 
candidates  for  more  expensive  processing.  Ideally,  MAC’s  computations  should  be  simple  enough  to  admit 
plausible  parallel  and/or  connectionist  implementations  for  large-scale  memory  pools. 

What  is  the  appropriate  crude  estimator  of  similarity?  The  most  straightforward  method  would  be  to  count 
the  number  of  match  hypotheses  that  FAC  would  generate  in  comparing  a  probe  to  a  memory  item.  Let  us 
call  this  number  the  numerosity  of  a  comparison.  Numerosity  bears  a  rough  relation  to  the  potential  size  of 
the  global  interpretation,  since  the  more  local  matches  there  are,  the  larger  the  global  interpretation  could 
potentially  be.  However,  a  large  number  of  match  hypotheses  does  not  guarantee  a  large  global 
interpretation,  for  two  reasons.  First,  many  match  hypotheses  might  be  ungrounded  (recall  Section  3.1) 
and  hence  cannot  be  part  of  any  global  interpretation.  Second,  often  many  combinations  of  match 
hypotheses  are  ruled  out  by  the  1:1  constraint,  preventing  the  formation  of  large  global  interpretations. 

Both  reasons  follow  directly  from  the  fact  that  numerosity  is  not  structurally  sensitive.  However, 
something  like  numerosity  is  at  least  a  crude  estimate  of  similarity. 

One  straightforward  way  to  implement  a  rough  similarity  estimator  would  be  to  calculate  numerosity  by 
building  the  actual  match  hypothesis  network  (e.g.,  to  carry  out  the  first  part  of  a  full  analogy  process)  for 
the  probe  and  each  memory  item,  and  then  count  the  match  hypotheses.  This  is  what  our  original  version 
of  MAC/FAC  did  (Gentner,  1989b).  It  also  is  roughly  what  ARCS  (Thagard  et  al  1990)  does.  ARCS 
models  retrieval  by  building  a  network  of  connections  similar  to  SME's  match  hypothesis  network  between 
the  probe  and  each  item  in  the  memory  pool  that  shares  a  semantically  similar  predicate  with  it.9  As  just 
discussed,  we  view  these  solutions  as  psychologically  and  computationally  implausible.  Even  with  parallel 

9  ARCS  is  based  on  Holyoak  &  Thagard's  (1989)  ACME,  an  analogy  matcher  which  uses  a  localist 
connectionist  network  similar  to  SME's  match  hypothesis  network  to  construct  a  single  interpretation  of  a 
comparison  via  constraint  satisfaction. 
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and/or  neural  hardware,  it  is  hard  to  see  how  to  generate  match  hypothesis  networks  between  a  probe  and 
everything  in  a  large  pool  of  memory,  while  still  providing  realistic  response  times.  A  cheaper  method  is 
required. 

We  have  developed  a  novel  technique  for  estimating  the  degree  of  match  in  which  structured 
representations  are  encoded  as  content  vectors.  Content  vectors  are  flat  summaries  of  the  knowledge 
contained  in  complex  relational  structures.  The  content  vector  for  a  given  description  specifies  which 
functors  (i.e.,  relations,  connectives,  object  attributes,  functions,  etc.)  were  used  in  that  description  and  the 
number  of  times  they  occurred.  Content  vectors  are  assumedto  arise  automatically  from  structured 
representations  and  to  remain  associated  with  them.  Content  vectors  are  a  special  form  of  feature  vectors. 

More  precisely,  let  n  be  the  set  of  functors  used  in  the  descriptions  that  constitute  memory  items  and 
probes.  We  define  the  content  vector  of  a  structured  description  as  follows.  A  content  vector  is  an  n-tuple 
of  numbers,  each  component  corresponding  to  a  particular  element  of  II.  Given  a  description  if),  the  value 
of  each  component  of  its  content  vector  indicates  how  many  times  the  corresponding  element  of  II  occurs  in 
<|).  Components  corresponding  to  elements  of  II  which  do  not  appear  in  statements  of  <|)  have  the  value  zero. 
One  simple  algorithm  for  computing  content  vectors  is  to  simply  to  count  the  number  of  occurrences  of 
each  functor  in  the  description.  Thus  if  there  were  four  occurrences  of  implies  in  a  story,  the  value  for  the 
implies  component  of  its  content  vector  would  be  four.  (Figure  6  illustrates.)  Thus  content  vectors  are  easy 
to  compute  from  a  structured  representation  and  can  be  stored  economically  (using  sparse  encoding,  for 
instance,  on  serial  machines). 


Solar  System:  Structured  representation 

(CAUSE 

(GRAVITY  (MASS  SUN)  (MASS  PLANET)) 
(ATTRACTS  SUN  PLANET)  ) 

(GREATER  (TEMPERATURE  SUN) 

(TEMPERATURE  PLANET) ) 

(CAUSE  (AND  (GREATER  (MASS  SUN) 

(MASS  PLANET)  ) 
(ATTRACTS  SUN  PLANET)  ) 
(REVOLVE -AROUND  PLANET  SUN)  ) 


Solar  System:  Content  Vector 


(AND  .  1) 

(ATTRACTS  .  1) 

(CAUSE  .  2) 

(GRAVITY  .  1) 
(GREATER  .  2) 

(MASS  .  2) 

(OBJECTS  .  2) 
(REVOLVE -AROUND  .  1) 
(TEMPERATURE  .  2) 


Rutherford  Atom:  Structured  representation 

(CAUSE  (OPPOSITE-SIGN  (CHARGE  NUCLEUS) 

(CHARGE  ELECTRON)) 
(ATTRACTS  NUCLEUS  ELECTRON) ) 
(REVOLVE-AROUND  ELECTRON 
NUCLEUS) 

(GREATER  (MASS  NUCLEUS) 

(MASS  ELECTRON) ) 


Rutherford  Atom:  Content  Vector 


(ATTRACTS  .  1) 

(CAUSE  .  1) 

(CHARGE  .  2) 

(GREATER  .  1) 

(MASS  .  2) 

(OBJECTS  .  2) 
(OPPOSITE-SIGN  .  1) 
(REVOLVE-AROUND  .  1) 


Here  are  some  simple  predicate  calculus  representations  and  the  corresponding  content  vectors.  A  simple 
counting  algorithm  is  used  here,  in  the  simulation  these  are  normalized  to  unit  vectors. 

Figure  6:  Sample  representations  with  content  vectors 
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How  good  an  approximation  is  the  content  vector  dot  product  to  what  SME  would  produce?  Suppose 
content  vectors  were  generated  using  the  simple  counting  algorithm  described  above.  Then  the  product  of 
each  corresponding  component  is  an  overestimate  of  the  number  of  match  hypotheses  that  would  be  created 
between  functors  of  that  type,  because  it  does  not  take  into  account  the  cases  when  the  arguments  to  the 
match  hypotheses  could  not  be  aligned.  There  is  also  a  possibility  of  underestimation,  since  the  dot  product 
does  not  take  into  account  matches  between  non-identical  functions  and  entities,  since  discovering  those 
matches  requires  tracing  predicate  bindings.  However,  in  practice,  the  number  of  entity  and  non-identical 
function  matches  tends  to  be  smaller  than  the  number  of  ungrounded  matches,  so  overall  the  dot  product 
tends  to  overestimate  numerosity,  and  hence  will  tend  to  be  an  overestimate  of  what  SME  would  produce. 

The  dot  product  of  content  vectors  provides  exactly  the  computational  basis  the  MAC  stage  needs.  It  could 
be  implemented  efficiently  for  large  memories  using  a  variety  of  massively  parallel  computation  schemes. 
For  instance,  connectionist  memories  can  be  built  which  find  the  closest  feature  vector  to  a  probe  (Hinton 
&  Anderson,  1989).  Therefore  the  MAC  stage  can  scale  up. 

To  summarize,  the  MAC  matcher  works  as  follows:  Each  memory  item  has  a  content  vector  stored  with 
it.10  When  a  probe  enters,  its  content  vector  is  computed.  A  score  is  computed  for  each  item  in  the  memory 
pool  by  taking  the  dot  product  of  its  content  vector  with  the  probe's  content  vector.  The  MAC  selector  then 
produces  as  output  the  best  match  and  everything  within  10%  of  it,  as  described  above.  (As  for  the  FAC 
stage,  variants  that  could  be  considered  include  adding  a  bound  on  the  number  of  items  returned  (to  model 
capacity  limitations)  and  implementing  a  threshold  on  the  MAC  selector  so  that  if  every  match  is  too  low 
MAC  returns  nothing). 

Like  other  feature-vector  schemes,  the  dot  product  of  content  vectors  does  not  take  the  actual  relational 
structure  into  account.  It  only  calculates  a  numerical  score,  and  hence  doesn't  produce  the  correspondences 
and  candidate  inferences  which  provide  the  power  of  analogical  reasoning  and  learning.  But  the  output  of 
MAC  feeds  to  the  FAC  stage,  which  operates  on  structured  representations.  Thus  it  is  the  FAC  stage  which 
both  filters  out  structurally  unsound  remindings  and  produces  the  desired  correspondences  and  candidate 
inferences.  We  claim  that  the  interplay  of  the  cheap  but  dumb  computations  of  the  MAC  stage  and  the 
more  expensive  but  structurally  sensitive  computations  of  the  FAC  stage  explains  the  psychological 
phenomena  of  Section  2.  As  the  first  step  in  supporting  this  claim,  we  next  demonstrate  that  MAC/FAC's 
behavior  provides  a  good  approximation  of  psychological  data. 


4.  Cognitive  Simulation  Experiments 

In  this  section  we  compare  the  performance  of  MAC/FAC  with  that  of  human  subjects,  using  the  “Karla 
the  Hawk”  stories  (Gentner,  Rattermann  &  Forbus,  1993;  Rattermann  &  Gentner,  1987,  Experiment  2). 
For  these  studies,  we  wrote  sets  of  stories  consisting  of  base  stories  plus  four  variants,  created  by 
systematically  varying  the  kind  of  commonalities.  All  stories  shared  first-order  relations  (primarily  events), 
but  varied  in  which  other  commonalities  were  present,  as  shown  in  Table  1.  The  LS  (literal  similarity) 
stories  shared  both  higher-order  relational  structure  and  object  attributes.  The  AN  stories  shared  higher- 
order  relational  structure  but  contained  different  attributes,  while  the  SF  stories  shared  attributes  but 


10  We  normalize  content  vectors  to  unit  vectors,  both  to  reduce  the  sensitivity  to  overall  size  of  the 
descriptions  and  because  we  assume  that  psychologically  plausible  implementation  substrates  for 
MAC/FAC  (e.g.,  neural  systems)  will  involve  processing  units  of  limited  dynamic  range. 
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contained  different  higher-order  relational  structure.  The  FOR  stories  differed  both  in  attributes  and 
higher-order  relational  structure. 


Common 

1st  o.  relations 

Common 
h.o.  relations 

Common 
object  attributes 

LS: 

Yes 

Yes 

Yes 

SF: 

Yes 

No 

Yes 

AN: 

Yes 

Yes 

No 

FOR: 

Yes 

No 

No 

Table  1:  Types  of  stories  used  in  the  “Karla  the  Hawk”  experiments 


In  this  study,  the  subjects  were  first  given  32  stories  to  remember,  of  which  20  were  base  stories  and  12 
were  distractors.  They  were  later  presented  with  20  probe  stories  which  matched  the  base  stories  as 
follows:  five  LS  matches,  five  AN  matches,  five  SF  matches  and  five  FOR  matches.  They  were  told  to 
write  down  any  prior  stories  of  which  they  were  reminded.  (Which  stories  were  in  each  similarity  condition 
was  varied  across  subjects.)  As  shown  in  Table  2  the  proportions  of  remindings  for  different  match  types 
were  .56  for  LS,  .53  for  SF,  .12  for  AN  and.09  for  FOR.  Table  2  also  shows  that  this  retrievability  order 
has  been  stable  acorss  three  variations  of  this  study:  LS  >  SF  >  AN  >  FOR.11 


Condition 

Proportion 

LS 

0.56 

SF 

0.53 

AN 

0.12 

FOR 

0.09 

Table  2:  Proportion  of  remindings  for  different  match  types:  Human  subjects 


As  discussed  above,  this  retrievability  order  differs  strikingly  from  the  soundness  ordering.  When  subjects 
were  asked  to  rate  how  sound  the  matches  were  -  how  well  the  inferences  from  one  story  would  apply  to 
the  other  -  they  rated  analogy  (AN)  and  literal  similarity  (LS)  as  significantly  more  sound  than  surface 
similarity  (SF)  and  FOR  matches  (matches  based  only  on  common  first-order  relations,  primarily  events). 
SME  running  in  analogy  mode  on  SF  and  AN  matches  correctly  reflected  human  soundness  rankings 
(Forbus  &  Gentner,  1989;  Gentner  et  al,  in  press;  Skorstad  et  al,  1987).  Here  we  seek  to  capture  human 
retrieval  patterns:  Does  MAC/FAC  duplicate  the  human  propensity  for  retrieving  SF  and  LS  matches 
rather  than  AN  and  FOR  matches.  The  idea  is  to  give  MAC/FAC  a  memory  set  of  stories,  then  probe  with 
various  new  stories.  To  count  as  a  retrieval,  a  story  must  make  it  through  both  MAC  and  FAC.  We  use 
replication  of  the  ordering  found  in  the  psychological  data,  rather  than  the  exact  percentages,  as  our 


11  LS  and  SF  did  not  differ  significantly  in  retrievability.  In  Experiment  2,  AN  and  FOR  did  not  differ 
significantly,  although  in  Experiment  1  AN  matches  were  better  retrieved  than  FOR  matches. 
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criterion  for  success  because  this  measure  is  more  robust,  being  less  sensitive  to  the  detailed  properties  of 
the  databases. 

For  the  computational  experiments,  we  encoded  predicate  calculus  representations  for  9  of  the  20  story  sets 
(45  stories).  Figure  7  shows  one  of  the  story  representations.  These  stories  are  used  in  all  three 
experiments  described  below. 


Figure  7:  A  representation  from  the  Karla  The  Hawk  story  set 
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(PROMISE  MAN1  KARLA 

(NOT  (ATTACK  MAN!  KARLA)  )  ) 
(ATTACK  MANX  DEER)) 

(CAUSE 

(EQUALS  (HAPPINESS  MAN1 )  HIGH) 
(PROMISE  MAN1  KARLA 

(NOT  (ATTACK  MAN1  KARLA)  )  )  ) 
(CAUSE 

(OBTAIN  MAN1  FEATHERS) 

(EQUALS  (HAPPINESS  MAN1 )  HIGH)) 
(FOLLOW 

(OFFER  KARLA  FEATHERS  MANX) 
(OBTAIN  MAN1  FEATHERS) ) 

(CAUSE 

(REALIZE  KARLA 

(DESIRE  MAN1  FEATHERS)) 
(OFFER  KARLA  FEATHERS  MAN1 ) ) 
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(SUCCESS 
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4.1  Cognitive  Simulation  Experiment  1 

In  our  first  study,  we  put  the  9  base  stories  in  memory,  along  with  the  9  FOR  stories  which  served  as 
distractors.  We  then  used  each  of  the  variants  -  LS,  SF,  and  AN  -  as  probes.  This  roughly  resembles  the 
original  task,  but  MAC/FAC's  job  is  easier  in  that  (1)  it  has  only  18  stories  in  memory,  while  subjects  had 
32,  in  addition  to  their  vast  background  knowledge;  (2)  subjects  were  tested  after  a  week's  delay,  so  that 
there  could  have  been  some  degradation  of  the  memory  representations. 


Table  3  shows  the  proportion  of  times  the  base  story  made  it  through  MAC  and  (then)  through  FAC.  The 
FAC  output  is  what  corresponds  to  human  retrievals.  MAC/FAC’s  performance  is  much  better  than  that  of 
the  human  subjects,  perhaps  partly  because  of  the  differences  noted  above.  However,  the  key  point  is  that 
its  results  show  the  same  ordering  as  those  of  human  subjects:  LS  >  SF  >  AN. 
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Probes 

MAC 

FAC 

LS 

1.0 

1.0 

SF 

0.89 

0.89 

AN 

0.67 

0.67 

Memory  contains  9  base  stories  and  9  FOR  matches;  probes  were  the  9  LS,  9  SF,  and  9  AN  stories. 
The  rows  show  proportion  of  times  the  correct  base  story  was  retrieved  for  different  probe  types. 

Table  3:  Proportion  of  correct  retrievals  given  different  kinds  of  probes 


4.2  Cognitive  Simulation  Experiment  2 

To  give  MAC/FAC  a  harder  challenge,  we  put  the  four  variants  of  each  base  story  into  memory.  This 
made  a  larger  memory  set  (36  stories)  and  also  one  with  many  competing  similar  choices.  Each  base  story 
in  turn  was  used  as  a  probe.  This  is  almost  the  reverse  of  the  task  subjects  faced,  and  is  more  difficult. 

Table  4  shows  the  mean  number  of  matches  of  different  similarity  types  that  succeed  in  getting  through 
MAC  and  (then)  through  FAC.  There  are  several  interesting  points  to  note  here.  First,  the  retrieval  results 
(i.e.,  the  number  that  make  it  through  both  stages)  ordinally  match  the  results  for  human  subjects:  LS  >  SF 
>  AN  >  FOR.  This  degree  of  fit  is  encouraging,  given  the  difference  in  task.  Second,  as  expected,  MAC 
produces  some  matches  that  are  rejected  by  FAC.  This  number  depends  partly  on  the  criteria  for  the  two 
stages.  Here,  with  MAC  and  FAC  both  set  at  10%,  the  mean  number  of  memory  items  produced  by  MAC 
is  3.4,  and  the  mean  number  accepted  by  FAC  is  1.6.  Third,  as  expected,  FAC  succeeds  in  acting  as  a 
structural  filter  on  the  MAC  matches.  It  accepts  all  of  the  LS  matches  MAC  proposes  and  some  of  the 
partial  matches  (i.e.,  SF  and  AN),  while  rejecting  most  of  the  inappropriate  matches  (i.e.,  FOR  and 
matches  with  stories  from  other  sets). 


Retrievals 

MAC 

FAC 

LS 

0.78 

0.78 

SF 

0.78 

0.44 

TA 

0.33 

0.22 

FOR 

0.22 

0.0 

Other 

1.33 

0.22 

Memory  contains  36  stories  (LS,  SF,  AN,  and  FOR  for  9  story  sets);  the  9  base  stories  used  as  probes 
Other  =  any  retrieval  from  a  story  set  different  from  the  one  to  which  the  base  belongs. 

Table  4:  Mean  numbers  of  different  match  types  retrieved  per  probe  when  base  stories  are  used 
as  probes 

4.3  Cognitive  Simulation  Experiment  3 

In  the  prior  simulations,  LS  matches  were  the  resounding  winner.  While  this  is  reassuring,  it  is  also 
interesting  to  know  which  matches  would  be  retrieved  if  there  were  no  perfect  overall  matches.  Therefore 
we  removed  the  LS  variants  from  memory  and  repeated  the  second  simulation  experiment,  again  probing 
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with  the  base  stories.  As  Table  5  shows,  SF  matches  are  now  the  clear  winners  in  both  the  MAC  and  FAC 
stages.  Again,  the  ordinal  results  match  well  with  those  of  subjects:  SF  >  AN  >  FOR. 


Retrievals 

MAC 

FAC 

SF 

0.88 

0.78 

AN 

0.56 

0.56 

FOR 

0.22 

0.11 

Other 

1.11 

0.11 

Memory  contains  27  stories  (9  SF,  9  AN,  9  FOR);  9  base  stories  used  as  probes 

Table  5:  Mean  numbers  of  different  match  types  retrieved  per  probe  with  base  stories  as  probes  and 
no  LS  stories  in  memory 


4.4  Summary  of  Cognitive  Simulation  Experiments 


The  results  are  encouraging.  First,  MAC/FAC's  retrieval  results  (i.e.,  the  number  that  make  it  through  both 
stages)  ordinally  match  the  results  for  human  subjects:  LS  >  SF  >  AN  >  FOR.  Second,  as  expected,  MAC 
produces  some  matches  that  are  rejected  by  FAC.  The  mean  number  of  memory  items  produced  by  MAC  is 
3.4  and  the  mean  number  accepted  by  FAC  is  1.6.  Third,  FAC  succeeds  in  its  job  as  a  structural  filter  on 
the  MAC  matches.  It  accepts  all  of  the  LS  matches  proposed  by  MAC  and  some  of  the  partial  matches 
(the  SF,  AN  and  FOR  matches),  and  rejects  most  of  the  inappropriate  matches  (the  “other”  matches  from 
different  story  sets).  It  might  seem  puzzling  that  FAC  accepts  more  SF  matches  than  AN  matches,  when  it 
normally  would  prefer  AN  over  SF.  The  reason  is  that  it  is  not  generally  being  offered  this  choice.  Rather, 
it  must  choose  the  best  from  the  matches  passed  on  by  MAC  for  a  given  probe  (which  might  be  AN  and 
LS,  or  SF  and  LS,  for  example). 

It  is  useful  to  compare  MAC/FAC 's  performance  with  that  of  Thagard,  Holyoak,  Nelson  &  Gochfeld's 
(1990)  ARCS  model  of  similarity-based  retrieval,  the  most  comparable  alternate  model.  Thagard  et  al  gave 
ARCS  the  Karla  the  hawk  story  in  memory  along  with  100  fables  as  distractors.  When  given  the  four 
similarity  variants  as  probes,  ARCS  produced  asymptotic  activations  as  follows:  LS  (.67),  FOR  (-.11),  SF 
(-.17),  AN  (-.27).  ARCS  thus  exhibits  at  least  two  violations  of  the  LS  >  SF  >  AN  >  FOR  order  found  for 
human  remindings.  First,  SF  remindings,  which  should  be  about  as  likely  as  LS  remindings,  are  quite 
infrequent  in  ARCS  —  less  frequent  than  even  the  FOR  matches.  Second,  AN  matches  are  less  frequent 
than  FOR  matches  in  ARCS,  whereas  for  humans  AN  was  always  ordinally  greater  than  FOR  and  (in 
Experiment  1)  significantly  so.  Thus  MAC/FAC  explains  the  data  better  than  ARCS.  This  is  especially 
interesting  because  Thagard  et  al  argue  that  a  complex  localist  connectionist  network  which  integrates 
semantic,  structural,  and  pragmatic  constraints  is  required  to  model  similarity-based  reminding.  While 
such  models  are  intriguing,  MAC/FAC  shows  that  a  simpler  model  can  provide  a  better  account  of  the 
data.  We  compare  MAC/FAC  with  ARCS  in  more  detail  in  Section  6. 

Finally,  and  most  importantly,  MAC/FAC's  overall  pattern  of  behavior  captures  the  motivating  phenomena. 
It  allows  for  structured  representations  and  for  processes  of  structural  alignment  and  mapping  over  these 
representations  thus  satisfying  the  structural  representation  and  structured  mappings  criteria.  It  produces 
fewer  analogical  matches  than  literal  similarity  or  surface  matches,  thus  satisfying  the  existence  of  rare 
insights  criterion.  The  majority  of  its  retrievals  are  LS  matches,  thus  satisfying  the  primacy  of  the 
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mundane  criterion.  It  also  produces  a  fairly  large  number  of  SF  matches,  thus  satisfying  the  surface 
superiority  criterion.  Finally,  its  algorithms  are  simple  enough  to  apply  over  large-scale  memories,  thus 
satisfying  the  scalability  criterion. 


5.  Sensitivity  Analyses 

The  experiments  of  the  previous  section  show  that  the  MAC/FAC  model  can  account  for  psychological 
retrieval  data.  This  section  looks  more  closely  into  why  it  does,  by  seeing  how  sensitive  the  results  are  to 
different  factors  in  the  model.  These  analyses  are  similar  in  spirit  to  those  carried  out  by  VanLehn  (1989) 
in  his  SIERRA  project.  VanLehn  used  his  model  to  generate  different  possible  learning  sequences  to  see  if 
these  variations  covered  the  space  of  observed  mistakes  made  by  human  learners  in  subtraction  problems. 
Thus  variations  in  the  model  were  used  to  generate  hypotheses  about  the  space  of  individual  differences. 
Our  methodology  is  quite  similar,  in  that  we  vary  aspects  of  our  model  in  order  to  better  understand  how  it 
accounts  for  data.  The  key  difference  is  that  we  are  not  attempting  to  model  individual  differences,  but 
instead  are  investigating  how  our  results  depend  on  different  aspects  of  the  theory.  Such  sensitivity 
analyses  are  routinely  used  in  other  areas  of  science  and  engineering;  we  believe  they  are  also  an  important 
tool  for  cognitive  modeling. 

Sensitivity  analyses  can  provide  insight  into  why  a  simulation  works.  Any  working  cognitive  simulation 
rests  on  a  large  number  of  design  choices.  Examples  of  design  choices  include  the  setting  of  parameters, 
the  kinds  of  data  provided  as  input,  and  even  the  particular  algorithms  used.  Some  of  these  design  choices 
are  forced  by  the  theory  being  tested,  some  choices  are  only  weakly  constrained  by  the  theory,  and  others 
are  irrelevant  to  the  theory  being  tested,  but  are  necessary  to  create  a  working  artifact.  Sensitivity  analyses 
can  help  verify  that  the  source  of  a  simulation's  performance  rests  with  the  theoretically  important  design 
choices.  Varying  theoretically  forced  choices  should  lead  to  a  degradation  of  the  simulation's  ability  to 
replicate  human  performance.  Otherwise,  the  source  of  the  performance  lies  elsewhere.  On  the  other  hand, 
varying  theoretically  irrelevant  choices  should  not  affect  the  results,  and  if  they  do,  it  suggests  that 
something  other  than  the  motivating  theory  is  responsible  for  the  simulator's  performance.  Finally,  seeing 
how  the  ability  to  match  human  performance  varies  with  parameters  that  are  only  weakly  constrained  by 
theory  can  lead  to  insights  about  why  the  model  works. 

In  the  rest  of  this  section,  we  describe  a  series  of  sensitivity  experiments  on  MAC/FAC.  These  experiments 
demonstrate  that  its  ability  to  replicate  human  performance  is  robust,  and  that  this  ability  depends  crucially 
on  the  theoretically  important  design  choices.  We  first  describe  the  methodology  used  in  these  experiments 
in  detail,  and  then  describe  three  sensitivity  analyses. 


5. 1  Method  for  Sensitivity  Analyses 

A  sensitivity  analysis  requires  a  standard  of  comparison,  a  baseline  against  which  to  judge  the  results  of 
variations.  We  use  as  our  baseline  the  simulation  experiments  described  in  Section  4  We  say  that  a 
particular  set  of  design  choices  satisfies  the  data  if  re-running  the  simulation  experiments  with  that  set  of 
design  choices  yields  results  which  match  the  human  data.  That  is,  the  frequency  of  retrievals  must  follow 
the  pattern  LS  >  SF  >  AN  >  FOR. 

There  are  many  design  choices  which  could  be  explored  via  sensitivity  analyses.  Conceptually,  one  can 
think  of  sets  of  design  choices  as  points  in  a  high  dimensional  space.  In  essense,  the  simulation  studies  of 
Section  4  provide  information  about  one  point  in  the  design  space.  This  metaphor  is  excellent  for  choices 
of  numerical  parameters,  since  these  dimensions  can  be  viewed  as  continuous.  This  metaphor  is  not  as 
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useful  for  other  kinds  of  design  choices,  e.g.,  algorithmic  choices,  since  systematically  enumerating  the  set 
of  plausible  algorithms  for  a  task  is  quite  difficult.  To  best  visualize  the  results,  choosing  two  numerical 
dimensions  to  vary  allows  patterns  of  satisfaction  to  be  displayed  as  a  table,  whose  entries  represent 
measurements  of  the  ability  of  the  model  to  satisfy  the  data  at  sampled  points  in  the  design  space. 

The  two  most  interesting  numerical  parameters  with  respect  to  sensitivity  analyses  on  MAC/FAC  are  the 
selector  widths  for  the  MAC  and  FAC  stages,  since  these  are  only  weakly  constrained  by  the  theory.  They 
should  be  narrow,  in  order  to  reject  inappropriate  remindings,  but  we  currently  see  no  theoretically 
motivated  method  to  calculate  precise  predictions  for  these  parameters.  Therefore  in  these  analyses  we  use 
an  empirical  approach.  We  vary  the  selector  widths,  using  these  variations  as  the  axes  of  a  subset  of  the 
design  space.  Recall  that  a  selector  of  width  W  accepts  all  matches  within  W  %  of  the  largest  input.  That 
is,  a  selector  with  width  10  %  outputs  the  best  match  plus  any  other  matches  that  are  within  10  %  of  the 
score  of  the  best  match,  while  a  selector  of  width  100  %  will  simply  pass  through  all  of  its  inputs.  In  the 
experiments  below,  selector  widths  for  both  MAC  and  FAC  are  varied  from  1  to  100%,  in  10% 
increments.  Each  entry  in  the  table  indicates  whether  that  pair  of  width  settings,  combined  with  the  other 
design  choices,  satisfied  the  data.  When  the  pattern  of  retrieval  is  violated,  the  table  entry  contains 
information  about  the  particular  kind  of  violation. 

Viewed  as  a  map,  the  table  of  results  from  the  sensitivity  analysis  can  be  divided  into  viable  regions, 
subspaces  of  design  choices  which  allow  the  model  to  satisfiy  the  data,  and  non-viable  regions,  where  they 
do  not.  The  existence  of  viable  regions  is  of  course  critical  for  a  successful  simulation.  However,  the 
nature  of  the  non-viable  regions  is  also  interesting,  because  they  provide  a  source  of  insight  into  why  the 
model  works.  Seeing  how  a  bridge  collapses  after  replacing  a  particular  strut  with  a  weaker  material 
(preferably  via  simulation)  supports  the  conclusion  that  the  strength  of  that  strut  was  a  factor  in  preventing 
collapse. 

It  should  be  noted  that  the  computational  costs  of  these  experiments  is  large  but  not  horrendous. 
Essentially,  the  cognitive  simulation  experiments  of  Sections  4.  land  4.2  were  replicated  for  each  pair  of 
selector  widths,  i.e.,  121  times.  Each  repetition  required  running  the  MAC  matcher  810  times,12  for  a  total 
of  98,010  times.  The  number  of  FAC  executions  varies  with  the  size  of  the  set  output  from  MAC,  of 
course,  and  varied  substantially  according  to  the  particular  design  choices  made  (as  shown  below).  A 
reasonably  accurate  estimate  for  the  lower  bound  of  FAC  executions  for  each  experiment  was  900  and  a 
reasonable  upper  bound  is  1,600.  The  MAC  matcher  takes  roughly  0.002  seconds  for  each  pair  of  content 
vectors  and  the  FAC  matcher  (i.e.,  SME)  takes  between  1.0  and  1 1  seconds  for  each  pair  of  structured 
representations,  with  an  average  time  of  roughly  4  seconds.13  Thus  the  time  to  run  MAC/FAC  for  each 
probe  typically  ranges  from  3  to  10  seconds.  A  naive  system  for  doing  sensitivity  analyses  could  use  as 
much  as  five  hours  per  analysis  (14,000  seconds  for  MAC,  4,000  seconds  for  FAC).  However,  we  found 
that  by  caching  the  results  of  matches  in  a  simple  database,  we  could  cut  the  CPU  requirements  for  these 
analyses  considerably. 


12  The  first  experiment  involves  486  MAC  executions  because  there  are  18  stories  in  memory  and  27 
probes.  The  second  experiment  involves  324  MAC  executions  because  there  are  36  stories  in  memory  and 
9  probes. 

13  These  times  are  for  an  IBM  RS/6000  Model  350,  SME3b,  which  was  used  in  all  experiments  in  this 
section.  An  earlier  version  of  SME  was  used  in  Forbus  &  Gentner  1991,  and  in  the  experiments  in  Section 
6. 


25 


5.2  Sensitivity  Analysis  One:  Robustness 

In  this  experiment  we  tested  the  robustness  of  MAC/FAC's  ability  to  satisfy  the  data  by  varying  the 
selector  widths.  Table  6  shows  the  results.  Notice  that  there  is  one  region  which  satisfies  the  data:  When 
the  MAC  width  is  between  10  %  and  20  %  and  FAC  is  at  least  10  %.  The  moderately  large  viable 
subspace  indicates  that  MAC/FAC's  performance  is  robust,  and  not  hostage  to  a  particular  choice  of 
selector  width  settings. 
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Y  =  Satisfies  the  data 

4  =  No  analogies 

64  =  SF  <  AN 

80  =  LS  <  AN  ,  SF  <  AN 

112=  LS<AN,  SF  <  AN ,  LS<FOR 

368=  LS<AN,  SF  <  AN ,  LS<FOR,  LS  <  DT 

Rows  are  the  width  of  the  MAC  selector.  Columns  correspond  to  the  width  of  the  FAC  selector.  The  codes 
describe  whether  that  combination  of  selector  widths  allows  MAC/FAC  to  account  for  the  human  data,  and  if  not, 
what  criteria  were  violated. 

Table  6:  Sensitivity  to  selector  width,  normalized  content  vectors 


As  discussed  above,  it  is  important  to  show  that  there  are  parameter  settings  that  do  not  fit  the  human  data, 
to  establish  that  the  theoretical  variables  actually  matter.  When  either  MAC  or  FAC  is  too  narrow  (i.e., 
MAC  of  1  %  or  FAC  of  1  %),  analogies  are  never  retrieved.  This  violates  the  rare  insights  criterion. 

When  MAC  is  broad  ( 30  %  or  larger),  making  FAC  too  broad  leads  first  to  too  many  analogies,  and  then 
to  junk  remindings.  The  shape  of  the  region  of  viability  suggests  that  while  FAC  is  necessary  to  provide 
structural  matching  and  candidate  inferences,  MAC  provides  most  of  the  filtering.  Since  that  is  MAC'S 
intended  purpose,  this  provides  further  evidence  that  the  simulation  works  according  to  the  principles  of  its 
design,  rather  than  some  unknown  factor. 

The  evidence  that  the  results  are  not  very  sensitive  to  the  particular  choice  of  selector  widths  in  the  original 
experiments  (i.e.,  10  %  for  both  MAC  and  FAC)  is  reassuring.  The  next  two  sensitivity  analyses  explore 
other  design  choices,  using  the  same  methodology  as  this  experiment. 
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5.3  Sensitivity  Analysis  Two:  Irrelevance  of  normalization  details 

In  other  sensitivity  experiments  on  analogical  processing  algorithms  (Forbus  &  Gentner,  1989),  we 
demonstrated  that  the  choice  of  normalization  algorithm  could  affect  outcomes  in  simulations  of  structural 
evaluation  in  comparisons.  The  purpose  of  this  analysis  is  to  determine  if  our  design  choice  of  using  unit 
content  vectors  (see  Section  3.3)  was  a  significant  factor  in  our  results. 

To  explore  this  question,  we  consider  two  variations  on  the  content  vector  representation.  The  first 
variation  is  simply  not  to  use  any  kind  of  normalization  at  all.  That  is,  we  simply  use  as  the  strength  of 
each  component  of  the  content  vector  the  number  of  statements  and  terms  which  contained  the 
corresponding  predicate.  (The  computation  of  normalized  content  vectors  involves  an  additional  step,  of 
dividing  each  component  by  the  total  number  of  predicates  in  the  description.)  The  results  of  this 
manipulation  are  illustrated  in  Table  7.  The  key  point  to  notice  about  this  table  is  that  the  viable  region  is 
roughly  the  same  size  and  shape  as  the  viable  region  for  normalized  content  vectors.  This  lends  support  to 
the  claim  that  the  outcome  of  the  simulation  experiments  is  not  heavily  determined  by  the  particular 
normalization  algorithm  chosen. 
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4  =  No  analogies 

64  =  SF  <  AN 

80  =  LS  <  AN,  SF<  AN 

88  =  LS  <  SF,  LS  <  AN  ,  SF  <  AN 

112  =  LS  <  AN,  LS  <  FA,  SF  <  AN 

120  =  LS  <  SF,  LS  <  AN,  LS  <  FA,  SF  <  AN 

256  =  LS  <  DT 

336  =  LS  S  AN,  SF  <  AN,  LS  <  DT 

368  =  LS  <  AN,  LS  <  FA,  SF  <  AN,  LS  <  DT 

376  =  LS  <  SF,  LS  <  AN,  LS  <  FA,  SF  <  AN,  LS  <  DT 

Rows  are  the  width  of  the  MAC  selector.  Columns  correspond  to  the  width  of  the  FAC  selector.  The  codes 
describe  whether  that  combination  of  selector  widths  allows  MAC/FAC  to  account  for  the  human  data,  and  if  not, 
what  criteria  were  violated. 

Table  7:  Sensitivity  analysis,  unnormalized  content  vectors 
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The  second  variation  we  consider  is  to  change  what  aspect  of  the  overlap  content  vectors  measure.  Recall 
that  the  idea  of  content  vectors  is  to  compare  the  pattern  of  functors  which  appear  in  two  structured 
descriptions.  There  are  several  ways  to  characterize  such  patterns.  The  MAC/FAC  design  choice, 
normalized  content  vectors,  estimates  the  overlap  in  terms  of  the  relative  frequency  of  functors  in  the  two 
descriptions,  independent  of  their  sizes.  The  unnormalized  content  vectors  just  examined  estimate  the  total 
size  of  the  overlap.  But  is  it  the  pattern  of  overlap  that  is  relevant,  or  just  how  many  functors  two 
descriptions  have  in  common?  We  can  investigate  this  question  by  changing  the  structure  of  content 
vectors  so  that  they  represent  only  the  set  of  predicates  that  are  used  in  the  structured  representation, 
without  regard  to  number  of  occurrences.  We  call  this  variation  binary  content  vectors  since  each 
component  is  essentially  a  one  bit  answer  to  the  question  of  whether  the  structured  represenation  contains 
or  does  not  contain  a  particular  predicate.  Thus  the  dot  product  of  two  binary  content  vectors  is  a  measure 
of  the  overlap  in  number  of  shared  predicates.  (Again,  we  normalize  to  unit  vectors,  both  to  avoid  size 
biases  and  because  we  assume  that  psychologically  plausible  implementation  substrates  (e.g.,  neural 
systems)  will  have  limited  dynamic  range.)  The  results  of  this  manipulation  are  shown  in  Table  8. 
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Y  =  Syslit  predictions  satisfied 

4  =  No  analogies 

64  =  SF  <  AN 

80  =  LS  <  AN,  SF  <  AN 

1 12  =  LS  <  AN,  LS  <  FA,  SF  <  AN 
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Binary  content  vectors  measure  the  size  of  overlap  in  predicates.  As  before,  rows  are  the  width  of  the  MAC 
selector,  Columns  correspond  to  the  width  of  the  FAC  selector.  The  codes  describe  whether  that  combination  of 
selector  widths  allows  MAC/FAC  to  account  for  the  human  data,  and  if  not,  what  criteria  were  violated. 

Table  8:  Sensitivity  analysis  for  binary  content  vectors 


Again,  the  overall  pattern  of  results  is  the  same:  With  selector  widths  that  are  too  narrow  no  analogies  are 
retrieved,  and  with  selector  widths  that  are  too  broad,  too  many  analogies  are  retrieved,  followed  as  widths 
increase  by  too  many  “junk”  retrievals.  The  interesting  difference  is  that  the  region  for  the  selector  widths 
has  changed:  The  viable  wide-FAC  range  lies  with  MAC  between  30--50  %,  whereas  it  was  between  10-- 
20  %  for  the  original  content  vectors.  Comparing  the  average  number  of  representations  output  by  MAC 


28 


for  these  ranges  provides  some  insight  as  to  why  this  should  be  so:  For  binary  content  vectors  the  average 
output  size  was  2,  for  standard  content  vectors  the  average  was  1.5.  In  both  cases,  the  next  step  of  MAC 
selector  width  allows,  on  the  average,  another  representation  to  make  it  through  to  FAC.  Yet  one  more  step 
in  MAC  selector  width  allows  many  more  representations  to  get  through  to  FAC.  Thus  measuring  only  the 
number  of  shared  predicates  shifts  the  viable  region,  but  does  not  substantially  change  its  character. 

From  these  two  analyses,  we  conclude  that  the  choice  of  normalization  algorithm  does  not  substantively 
affect  the  results.  Since  the  normalization  algorithm  is  not  a  theoretically  determined  choice,  these  analyses 
support  the  conclusion  that  the  simulation  works  according  to  the  theoretical  account. 

5.4  Sensitivity  Analysis  Three:  Attributes  versus  relations 

Content  vectors  homogenize  structured  representations.  They  unify  information  about  attributes  of  objects, 
relationships  between  objects,  and  argument  structure.  Is  including  every  kind  of  information  in  content 
vectors  necessary?  Given  the  frequency  of  literal  similarity  and  surface  feature  matches,  both  of  which 
share  many  attributes,  a  possible  hypothesis  is  that  content  vectors  could  be  built  using  attributes  alone. 

On  the  other  extreme,  the  approachs  used  in  case-based  reasoning  tend  to  ignore  attributes,  and  use  only 
relational  information.  To  mimic  these  approaches  in  MAC/FAC,  we  could  use  content  vectors  which 
leave  out  attributes,  and  include  only  relational  predicates.  This  analysis  explores  both  of  these  extreme 
hypotheses. 

To  explore  the  degree  to  which  using  attribute  information  only  in  content  vectors  would  allow  MAC/FAC 
to  satisfy  the  data,  we  modified  the  algorithm  which  computes  content  vectors  to  ignore  anything  other  than 
attributes.  The  results  of  the  sensitivty  analysis  are  shown  in  Table  9.  The  pattern  of  results  is 
dramatically  different  than  in  previous  experiments.  There  is  no  viable  region  at  all.  This  experiment 
provides  strong  evidence  that  using  attribute  information  alone  in  content  vectors  cannot  satisfy  the  data. 

The  failure  of  attributes  alone  to  provide  adequate  filtering  may  not  be  surprising.  Is  relational  information 
alone  enough?  To  explore  this  question  we  again  modified  the  algorithm  that  computes  content  vectors, 
this  time  to  not  include  attributes.  These  new  content  vectors  therefore  only  contained  relationships 
between  objects  and  higher-order  relations,  such  as  logical  connectives.  The  same  methodology  for  the 
sensitivity  analysis  was  followed. 

The  results  of  the  sensitivity  analysis  are  shown  in  Table  10.  Like  the  attribute-only  content  vectors,  the 
relation-only  content  vectors  also  fail  to  satisfy  the  data  in  a  psychologically  plausible  manner,  but  for 
different  reasons.  Almost  uniformly,  that  is,  when  either  the  MAC  width  is  less  than  40  %  or  when  the 
FAC  width  is  greater  than  70  %,  more  “junk”  matches  come  through  --  stories  from  other  sets,  and  FOR 
stories  (i.e.,  those  which  match  only  in  terms  of  first-order  relations  and  not  attributes  or  causal  structure). 
The  region  where  the  data  is  not  satisfied  and  the  MAC  width  ranges  between  20  %  and  70  %  is  very  much 
like  the  failures  that  occur  for  the  attribute-only  vectors  (e.g.,  more  analogies  retrieved  with  narrow  FAC 
than  psychologically  plausible).  There  is  in  fact  a  region  in  Table  10  where  the  pattern  of  results  matches 
the  human  data,  when  the  MAC  width  is  between  40%  and  50%  and  the  FAC  width  ranges  from  20%  to 
either  60%  or  70%.  However,  the  size  of  the  MAC  output  in  this  range  is  roughly  one  half  of  the  total  size 
of  the  memory  pool.  Consequently,  this  is  not  a  viable  region,  because  it  demands  far  too  much  of  FAC. 
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Rows:  MAC  widths.  Columns:  FAC  widths. 
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Y  =  Syslit  predictions  satisfied 

4  =  No  analogies 

64  =  SF  <  AN 

80  =  LS  <  AN,  SF  <  AN 

88  =  LS  <  SF,  LS  <  AN,  SF  <  AN 
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These  results  obtained  are  using  content  vectors  which  only  included  attributes,  leaving  out  relations  and  logical 
connectives.  As  before,  rows  are  the  width  of  the  MAC  selector,  Columns  correspond  to  the  width  of  the  FAC 
selector.  The  codes  describe  whether  that  pair  of  selector  widths  allows  MAC/FAC  to  account  for  the  human  data, 
and  if  not,  what  criteria  were  violated. 

Table  9  Sensitivity  analysis  of  attribute-only  content  vectors 
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Rows:  MAC  widths.  Columns:  FAC  widths. 


1% 

10% 

20% 

30% 

40% 

50% 

60% 

70% 

80% 

90% 

100% 

1% 

260 

260 

260 

268 

268 

268 

268 

268 

268 

268 

268 

10% 

270 

268 

268 

268 

268 

268 

268 

268 

268 

268 

268 

20% 

262 

320 

256 

256 

384 

384 

384 

384 

384 

384 

384 

30% 

260 

256 

256 

256 

264 

264 

264 

264 

264 

264 

264 

40% 

4 

64 

Y 

EM 

Y 

Y 

EM 

256 

256 

256 

256 

50% 

4 

64 

EM 

Y 

Y 

em 

Y 

Y 

256 

256 

256 

60% 

em 

64 

EM 

64 

80 

80 

80 

336 

336 

336 

336 

70% 

EH 

64 

64 

80 

80 

80 

112 

368 

368 

368 

368 

80% 

EM 

64 

64 

80 

80 

80 

112 

368 

368 

368 

368 

90% 

4 

64 

64 

80 

80 

80 

112 

368 

368 

368 

368 

100% 

4 

64 

64 

80 

80 

80 

112 

368 

368 

368 

368 

Legend 

Y  =  Satisfies  the  psychological  data 

4  =  No  analogies 

64  =  SF  <  AN 

80  =  LS  ^  AN  ,  SF  <  AN 

112  =  LS  S  AN  ,  SF<AN,  LS<FOR 

256  =  LS  <  DT  260  =  No  analogies,  LS  <  DT,  LS  <  DT 

262  =  No  surface  matches,  no  analogies,  LS  <  DT,  LS  <  DT 

264  =  LS  <  SF,  LS  <  DT,  LS  <  DT 

268  =  No  analogies,  LS  <  SF,  LS  <  DT,  LS  <  DT 

270  =  No  surface  matches,  no  analogies,  LS  <  SF,  LS  <  DT,  LS  <  DT  3 

36  =  LS  <  AN,  SF  <  AN,  LS  <  DT,  LS  <  DT 

368  =  LS  <  AN  ,  SF  <  AN ,  LS  <  FOR  ,  LS  <  DT 

384  =  AN  <  FOR,  LS  <  DT 

These  results  are  obtained  using  content  vectors  which  only  included  relations  and  logical  connectives,  leaving  out 
attributes.  As  before,  rows  are  the  width  of  the  MAC  selector,  Columns  correspond  to  the  width  of  the  FAC 
selector.  The  codes  describe  whether  that  pair  of  selector  widths  allows  MAC/FAC  to  account  for  the  human  data, 
and  if  not,  what  criteria  were  violated. 

Table  10  Manipulation:  Relation-only  vectors 


These  experiments  provide  evidence  that  neither  attribute  information  nor  relational  structure,  by 
themselves,  provide  the  right  kind  of  information  to  allow  the  MAC/FAC  model  to  plausibly  satisfy  the 
psychological  data.  Although  such  generalizations  must  be  viewed  with  caution,  the  analysis  of  why  these 
alternatives  fail  may  be  applied  to  any  retrieval  model,  not  just  MAC/FAC.  Using  attribute  information 
alone  does  not  allow  a  retrieval  system  to  satisfy  the  rare  insights  criterion,  since  the  relational  information 
is  not  used  as  a  cue  in  retrieval.  Using  relational  information  alone  tends  to  violate  the  scalability  criterion, 
since  large  fractions  of  memory  must  be  searched  when  the  discrimination  provided  by  the  relational 
vocabulary  is  inadequate. 
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6.  Comparing  MAC/FAC  and  ARCS  on  ARCS  datasets 

As  mentioned  earlier,  the  model  of  similarity-based  retrieval  that  is  closest  to  MAC/FAC  is  ARCS 
(Thagard,  et.  al.  1990).  The  ARCS  algorithm  is  shown  in  Figure  8.  ARCS  uses  a  localist  connectionist 
network  to  apply  semantic,  structural,  and  pragmatic  constraints  to  selecting  items  from  memory.  Most  of 
the  work  in  ARCS  is  carried  out  by  the  constraint  satisfaction  network,  which  provides  an  elegant 
mechanism  for  integrating  the  disparate  constraints  that  Thagard  et  al  posulate  as  important  to  retrieval. 
The  use  of  competition  in  retrieval  is  designed  to  reduce  the  number  of  candidates  retrieved.  Using 
pragmatic  information  provides  a  means  for  the  system’s  goals  to  affect  the  retrieval  process. 


Given  a  pool  of  memory  items  II. .In  and  a  probe  P: 

1.  For  each  item  Ii,  include  it  in  a  matching  network  if  there  are  any  predicates  in  Ii  that  are  semantically  similar 
to  a  predicate  in  P.  The  matching  network  implements  semantic  and  structural  constraints. 

2.  Create  inhibitory  links  between  units  representing  competing  retrieval  hypotheses,  to  ensure  competitive 

retrieval.  . 

3.  Install  pragmatic  constraints  by  creating  excitatory  links  between  a  special  pragmatic  node  and  every  predicate 

marked  by  the  user  as  important. 

4.  Run  the  network  until  it  settles. 

Figure  8:  The  ARCS  algorithm 


After  the  network  settles,  an  ordering  can  be  placed  on  nodes  representing  retrieval  hypotheses  based  on 
their  activation.  Unfortunately,  no  formal  criterion  was  ever  specified  by  which  a  subset  of  these  retrieval 
hypotheses  is  selected  to  be  considered  as  what  is  retrieved  by  ARCS.  Consequently,  in  the  experiments 
below  we  mainly  focus  on  the  subset  of  retrieval  nodes  mentioned  by  Thagard  et  al  in  their  paper. 


6.1  Theoretical  tradeoffs 

Both  models  have  their  appeals  and  drawbacks.  Here  we  briefly  examine  several  of  each. 

•  Pragmatic  effects :ln  MAC/FAC  it  is  assumed  that  pragmatics  and  context  affect  retrieval  according  to 
what  is  encoded  in  the  probe.  That  is,  we  assume  that  plans  and  goals  are  important  enough  to  be 
explicitly  represented,  and  hence  will  affect  retrieval.  In  ARCS  additional  influence  can  be  placed  on 
particular  subsets  of  such  information  by  the  user  marking  it  as  important.  The  tradeoff  between  these 
alternatives  will  best  be  explored  by  embedding  them  in  larger,  task-oriented  simulations,  so  we  do  not 
consider  effects  of  pragmatics  further  in  this  paper. 

•  Utility  of  results:  Because  MAC/FAC  uses  SME  in  the  FAC  stage,  the  result  of  retrieval  can  include 
novel  candidate  inferences.  Since  the  purpose  of  retrieval  is  to  find  new  knowledge  to  apply  to  the 
probe,  this  is  a  substantial  advantage.  ARCS  could  close  this  gap  somewhat  by  using  ACME 
(Holyoak  &  Thagard,  1989)  as  a  post  processor. 

•  Initial  filterin g:  MAC/FAC’ s  content  vectors  represent  the  overall  pattern  of  predicates  occuring  in  a 
structured  description,  so  that  the  dot  product  cheaply  estimates  overlap.  ARCS’  committment  to 
creating  a  network  if  there  is  any  predicate  overlap  places  more  of  the  retrieval  burden  on  the  expensive 
process  of  setting  up  networks.  The  inclusive  rather  than  exclusive  nature  of  ARCS’  initial  stage  leads 
to  the  paradoxical  fact  that  a  system  in  which  pragmatic  constraints  are  central  must  ignore  CAUSE, 
IF,  and  other  inferentially  important  predicates  to  be  tractable. 
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•  Modeling  inter-item  effects:  Wharton,  Holyoak,  Downing,  Lange,  Wickens,  and  Melz  (1994)  have 
shown  that  ARCS  can  model  effects  of  competition  between  memory  items  in  heightening  the  relative 
effect  of  structural  similarity  to  the  probe. 

Perhaps  the  most  important  issue  is  the  notion  of  semantic  similarity.  A  key  issue  in  analogical  processing 
is  what  criterion  should  be  used  to  decide  if  two  elements  can  be  placed  into  correspondence.  The  FAC 
stage  of  MAC/FAC  follows  the  standard  structure-mapping  position  that  analogy  is  concerned  with 
discovering  identical  relational  systems.  Thus,  other  elements  can  be  matched  flexibly  in  service  of 
relational  matching:  any  two  entities  can  be  placed  in  correspondence  and  functions  can  be  matched  non- 
identically  if  doing  so  enables  a  larger  structure  to  match.  But  relations  have  only  three  choices:  they  can 
match  identically,  as  in  (a);  they  can  fail  to  match,  as  in  (b);  if  the  surrounding  structural  match  warrants 
it,  they  can  be  re-represented  in  such  a  way  that  part  of  their  representation  now  matches  identically,  as  in 
the  shift  from  (c)  to  (d). 

(a)  HEAVIER  [camel,  cow]  —  HEAVIER  [giraffe,  donkey] 

(b)  HEAVIER  [camel,  cow]  —  BITE  [dromedary,  calf] 

(c)  HEAVIER  [camel,  cow]  —  TALLER  [giraffe,  donkey] 

(d)  GREATER  [WEIGHT(camel),  WEIGHT(cow)]  —  GREATER  [HEIGHT(camel),  HEIGHT(cow)]. 

ACME  and  ARCS  also  share  the  intuition  that  analogy  is  a  kind  of  compromise  between  similarity  of 
larger  structures  and  similarity  of  individual  elements  --  semantic  similarity,  in  Holyoak  and  Thagard's 
(1989)  terms.  But  the  local  similarity  metric  is  different.  These  systems  use  graded  similarity  at  all  levels 
and  for  all  kinds  of  predicates;  relations  have  no  special  status.  Thus  ARCS  and  ACME  might  find  pair 
(b)  above  more  similar  than  pair  (a),  because  of  the  object  similarity.  This  would  not  be  true  for  SME  and 
MAC/FAC. 

In  ACME,  semantic  similarity  was  operationalized  using  similarity  tables.  For  any  potential  matching 
term,  a  similarity  table  was  used  to  assign  a  similarity  rating,  which  was  then  combined  with  other  evidence 
to  decide  whether  the  two  predicates  could  match.  Thus,  in  the  examples  above,  both  pair  (b)  and  pair  (c) 
stand  a  good  chance  of  being  matched,  depending  on  the  stored  similarities  between  TALLER,  HEAVIER, 
and  BITE,  camel,  dromedary  and  giraffe,  and  so  on. 

In  ARCS,  an  augmented  subset  of  WordNet  (Miller,  Fellbaum,  Kegl,  &  Miller,  1988)  was  used  to  make 
semantic  similarity  decisions.  WordNet  is  a  psycholinguistic  database  describing  relationships  between 
words.  Two  predicates  in  ARCS  are  considered  semantically  similar  if  their  corresponding  lexical  concepts 
in  WordNet  are  connected  via  links  that  denote  particular  relationships.  The  use  of  WordNet  as  a 
database  for  simple  lexical  inferences  is  an  appealing  idea.  The  lexical  connections  found  in  this  way 
should  have  well-founded  motivations.  Nevertheless,  it  important  to  remember  that  WordNet  was  intended 
as  a  lexicon,  not  a  language  of  thought.  Using  the  lexical  concepts  of  WordNet  as  a  predicate  vocabulary 
requires  assuming  that  there  exist  conceptual  representations  that  correspond  to  these  lexical  concepts. 

That  does  not  seem  an  implausible  assumption.  However,  assuming  that  relationships  between  words,  such 
as  synonym  or  antonym  are  used  in  the  cognitive  processing  of  internal  representations  seems  implausible. 

We  prefer  our  tiered  identicality  account,  which  uses  inexpensive  inference  techniques  to  suggest  ways  to 
re-represent  non-identical  relations  into  a  canonical  representation  language.  Such  canonicalization  has 
many  advantages  for  complex,  rich  knowledge  systems,  where  meaning  arises  from  the  axioms  that 
predicates  participate  in.  When  mismatches  occur  in  a  context  where  it  is  desirable  to  make  the  match,  we 
assume  that  people  make  use  of  techniques  of  re-representation.  An  example  of  an  inexpensive  inference 
technique  to  suggest  rerepresentation  is  Falkenhainer's  (1987,  1990a)  minimal  ascension  method,  which 
looks  for  common  superordinates  when  context  suggested  that  two  predicates  should  match.  The  use  of 
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pure  identically  augmented  by  minimal  ascension  allowed  Falkenhainer’s  PHINEAS  system  to  model  the 
discovery  of  a  variety  of  physical  theories  by  analogy.  We  believe  that  WordNet  could  be  used  in  a  similar 
fashion,  since  it  has  superordinate  information. 

Holyoak  &  Thagard  have  argued  that  broader  (i.e.,  weaker)  notions  of  semantic  similarity  are  crucial  in 
retrieval,  for  otherwise  we  would  suffer  from  too  many  missed  retrievals.  Although  this  at  first  sounds 
reasonable,  there  is  a  counterargument  based  on  memory  size.  Human  memories  are  far  larger  than  any 
cognitive  simulation  yet  constructed.  In  such  a  case,  the  problem  of  false  positives  (i.e.,  too  many 
irrelevant  retrievals)  becomes  critical.  False  negatives  are  of  course  a  problem,  but  they  can  be  overcome 
to  some  extent  by  reformulating  and  re-representing  the  probe,  treating  memory  access  as  an  iterative 
process  interleaved  with  other  forms  of  reasoning  (as  in  Lange  &  Wharton’s  (1992a,b,  1993)  REMIND 
model).  Thus  it  could  be  argued  that  strong  semantic  similarity  constraints,  combined  with  re- 
representation,  are  crucial  in  retrieval  as  well  as  in  mapping. 

How  do  these  different  accounts  of  semantic  similarity  fare  in  predicting  patterns  of  retrieval?  In  the  rest  of 
this  section  we  tackle  this  question  by  comparing  the  performance  of  MAC/FAC  and  ARCS  on  a  variety  of 
examples. 

6.2  Computational  Experiments  comparing  MAC/FAC  and  ARCS 
6.2.1  Methods 

Each  experiment  below  has  a  similar  structure.  First  each  simulation  is  given  a  memory,  consisting  of  one 
or  more  databases  drawn  from  the  ARCS  representations.14  Then  retrieval  is  tested  with  probes  drawn 
from  a  small  predefined  set  of  stories,  replicating  Thagard  et.al.’s  experiments.  The  memory  a  simulation 
operates  over  consists  of  one  or  more  databases.  In  some  cases  the  memory  is  augmented  by  a  particular 
story:  e.g.,  when  probing  with  variant  Hawk  stories,  the  Thagard  et.al.  encoding  of  the  “Karla  the  Hawk” 
story  is  added  to  memory.  (This  is  done  to  see  if  the  retrieval  system  is  able  to  find  the  base  story  amidst 
the  distractors,  given  variations  on  the  story  as  probes.) 

For  brevity  we  specify  the  probe  set  and  memory  contents  symbolically,  using  “/”  to  distinguish  probe  set 
from  memory  and  “+”  to  indicate  set  union.  Thus  HAWK/(PLAYS+Karla  Base)  indicates  an  experiment 
where  the  database  of  plays  was  probed  with  the  Hawk  stories.  A  description  of  the  datasets  used  and 
these  conventions  is  summarized  in  Figure  9. 


14  To  date  we  have  been  unsuccessful  in  getting  ARCS  to  run  on  many  of  the  representations  we  used  in 
Sections  4  and  5.  In  some  cases  ARCS’  network  does  not  settle  after  even  1,000  iterations,  and  run  times 
of  up  to  twelve  hours  have  been  required. 
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Databases: 

FABLES  =  100  encodings  of  Aesop’s  fables,  encoded  by  Thagard  et.al. 

PLAYS  =  25  encodings  of  Shakespeare’s  plays,  encoded  by  Thagard  et.al. 

Story  sets  used  as  probes  and  memory  items: 

HAWK  =  Thagard  et.al.’s  encoding  of  the  “Karla  the  Hawk”  story  set,  i.e.,  original  story,  analog,  appearance 
match,  false  analogy,  and  literal  similarity  versions.  Databases  using  these  probes  have  the  original  story  added  to 
memory,  except  when  the  original  story  itself  is  used  as  a  probe. 

SG  =  Thagard  et.al.’s  encoding  of  the  Sour  Grapes  fable  plus  variations,  i.e.,  original  story,  analog,  appearance, 
and  literal  similarity  versions.  Databases  using  these  probes  have  the  original  story  added  to  memory,  except  when 
the  original  story  itself  is  used  as  a  probe. 

H&WSS  =  Thagard  et.al’s  encoding  of  Hamlet  and  West  Side  Story.  When  Hamlet  is  used  as  a  probe  it  is  removed 
from  memory.  West  Side  Story  is  never  placed  in  memory. 

Convention:  For  convenience,  we  refer  to  an  experimental  setup  by  the  probe  stories  followed  by  the  database 
used,  e.g.,  SG/(FABLES+PLAYS)  means  that  the  Sour  Grapes  fables  were  used  as  probes  with  a  memory 
consisting  of  both  plays  and  fables.  When  a  story  is  used  as  a  probe,  it  is  removed  from  memory  first. 

Figure  9:  Databases  and  experimental  stories  used  in  the  experiments 


Both  MAC/FAC  and  ARCS  take  propositional  representations  as  inputs,  but  their  representation 
conventions  are  quite  different.  The  most  crucial  difference  is  that  structure-mapping  treats  attributes, 
relations,  and  functions  differently,  whereas  ARCS  does  not  distinguish  them.  We  used  the  following  rules 
in  translation:  (1)  One-place  predicates  were  classified  as  attributes,  (2)  multi-argument  predicates  were 
classified  as  relations,  and  (3)  since  the  arguments  to  CAUSE  could  be  either  events  or  modal  propositions, 
we  treated  predicates  used  as  arguments  to  a  CAUSE  statement  either  as  modal  relations  (e.g., 
BECOMING-TRUE )  or  functions  (e.g.,  MARRIED ,  KILLED).  Since  functions  can  be  substituted  under 
structure-mapping’s  identically  critierion,  we  ran  these  experiments  on  representations  translated  both  with 
and  without  rule  (3)  above,  i.e.,  with  and  without  functions.  With  one  exception,  noted  below,  the  results 
were  essentially  identical  with  either  translation  scheme. 

All  runtimes  are  measured  according  to  the  Lucid  Common  Lisp  internal  clock.  A  single  computer15  was 
used  for  both  simulations,  so  that  run  times  would  be  comperable. 

Replication  of  computational  experiments  is  still  something  of  a  novelty,  and  standards  for  ensuring  that 
reported  simulation  results  are  repeatable  have  not  yet  been  established  in  cognitive  science.  Nevertheless, 
we  have  taken  many  precautions  to  ensure  that  we  have  run  ARCS  correctly.  Where  numerical  information 
was  available,  for  instance,  we  matched  numerical  results  reported  by  them  to  several  decimal  places.  One 
concern  was  what  should  count  as  a  retrieval  in  ARCS.  Neither  the  original  ARCS  paper  nor  the  code 
defines  a  criterion  for  distinguishing  when  an  item  is  actually  retrieved  (indeed,  stories  with  negative 
activations  were  sometimes  considered  retrievals).  In  reporting  ARCS  results  we  cut  off  the  list  of 
retrieved  results  where  Thagard  et  al.  did.  In  some  cases  (e.g.,  fables)  this  represented  a  sharp  boundary,  in 
other  cases  (e.g.,  plays)  it  did  not. 

6.3  Experiment  1:  Sour  Grapes  Comparison 

In  the  first  study  the  memory  set  consists  of  the  fables,  including  the  Sour  Grapes  fable,  and  the  probes  are 
variants  of  Sour  Grapes.  For  ARCS  results,  the  numbers  in  parentheses  represent  the  level  of  activation  it 


15  An  IBM  RS/6000  Model  530,  with  128MB  of  RAM  using  Lucid  Common  Lisp  4.01. 
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computed  for  that  retrieval.  For  MAC/FAC  results,  the  numbers  in  parentheses  represent  the  scores 
computed  by  the  MAC  or  FAC  stages,  as  marked. 

Table  1 1  shows  the  results.  The  results  for  ARCS  match  those  reported  for  the  simulation  by  Thagard 
et.al.  The  MAC/FAC  results  are  quite  similar.  Thus  both  systems  successfully  retrieve  Sour  Grapes  from 
a  database  of  fables  when  given  variations  of  it.  However,  MAC/FAC  is  substantially  faster.  The  runtime 
difference  is  fairly  typical;  MAC/FAC  tends  to  be  two  orders  of  magnitude  faster  than  ARCS  when  tested 
with  identical  data  on  the  same  computer. 


ARCS  results 


Probe 

Results 

Seconds 

Sour  Grapes  appearance 

Sour  Grapes  (0.28) 

120 

Sour  Grapes,  analog 

Sour  Grapes  (0.21) 

81 

Sour  Grapes,  literal 
similarity 

Sour  Grapes  (0.25) 

123 

MAC/FAC  Results 


Probe 

Results 

Seconds 

Sour  Grapes  appearance 

FAC:  Sour  Grapes  (0.53) 
MAC:  Sour  Grapes  (0.56) 

0.3 

Sour  Grapes  analog 

FAC:  Sour  Grapes  (2.03) 
MAC:  Sour  Grapes  (0.62) 

0.2 

Sour  Grapes  literal 
similarity 

FAC:  Sour  Grapes  (2.03) 
MAC:  Sour  Grapes  (0.62) 

0.2 

For  ARCS  results,  the  numbers  in  parentheses  represent  the  level  of  activation  it  computed  for  that 
retrieval.  For  MAC/FAC  results,  the  numbers  in  parentheses  represent  the  scores  computed  by  the  MAC 
or  FAC  stages,  as  marked. 

Table  11:  Results  for  SG/Fables  experiment 

6.4  Experiment  2:  Effects  of  additional  memory  items  on  retrieval  (Sour  Grapes) 

To  check  the  stability  of  results  under  changes  in  memory  contents,  we  reran  Experiment  1,  adding  the 
database  of  25  Shakespeare  plays  encoded  by  Thagard  et  al.  to  the  fables  database.  We  then  tested  the 
simulations  to  see  if  they  would  retrieve  Sour  Grapes  from  the  database  of  125  fables  and  plays  when 
probed  with  variations  of  Sour  Grapes.  The  results  are  show  in  Table  12.  MAC/FAC’s  results  remain 
unchanged,  except  for  a  small  increase  in  processing  time.  ARCS,  on  the  other  hand,  is  distracted  by  the 
plays  in  one  of  the  probe  conditions.  Increasing  the  memory  by  25%  has  led  to  different  results  with 
ARCS.  The  results  also  hint  at  a  possible  size  bias  in  ARCS:  It  appears  to  prefer  larger  descriptions  in 
retrieval,  at  the  cost  of  correct  matches. 


36 


ARCS  Results 


Probe 

Results 

Seconds 

Sour  Grapes  (0.28) 

327 

Sour  Grapes,  analog 

The  Taming  of  the  Shrew  (0.22),  Merry  Wives 
(0.18), 

[11  stories],  Sour  Grapes  (-0.19) 

251 

E 

Sour  Grapes  (0.25) 

373 

MAC/FAC  Results _ _ _ 

Probe _ Results _ Seconds 

Sour  Grapes  appearance  FAC:  Sour  Grapes  (0.53)  0.4 

_ MAC:  Sour  Grapes  (0.56) _ 

Sour  Grapes  analog  FAC:  Sour  Grapes  (2.03)  0.3 

_ MAC:  Sour  Grapes  (0.62) _ 

Sour  Grapes,  literal  FAC:  Sour  Grapes  (2.03)  0.3 

similarity _  MAC:  Sour  Grapes  (0.62)  _ 

Table  12:Results  of  SG  probes,  database  =  Fables  +  Plays 

6.5  Experiment  3:  Larger  Probe  sizes 

While  the  results  for  MAC/FAC  in  Experiment  2  are  satisfactory,  ARCS’  seemingly  poor  performance 
requires  further  investigation.  Does  the  relative  size  of  the  probe  matter  in  the  memory  swamping  effect? 
To  find  this  out,  we  again  ran  both  simulations,  first  with  the  plays  database  as  memory,  then  with  the  25 
plays  and  100  fables  as  memory,  this  time  using  as  probes  the  Hamlet  and  West  Side  Story  encodings  as 
probes,  as  represented  by  Thagard  et.al.  Given  Hamlet  as  a  probe,  the  question  is  whether  the  systems  can 
retrieve  a  tragedy,  or  at  least  another  Shakespeare  play.  Given  West  Side  Story  as  a  probe,  the  challenge  is 
more  specific:  To  retrieve  Romeo  &  Juliet,  the  analogous  play. 

Table  13  shows  the  results  for  plays  only  in  memory,  and  Table  14  shows  the  results  with  both  plays  and 
fables  in  memory.  The  good  news  for  ARCS  is  that  the  fables  have  only  minimally  intruded  on  the 
activation  for  the  top  ranked  retreived  plays.  A  Midsummer  Night’s  dream  is  ARCS’  top-ranked  retrieval 
for  West  Side  Story,  but  it  did  also,  as  stated  by  Thagard  et  al,  retrieve  Romeo  &  Juliet. 
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ARCS  results 


Probe 

Results 

Seconds 

Hamlet 

Romeo  &  Juliet  (0.54),  King  Lear  (0.53),  Othello 
(0.46), 

Cymbeline  (0.42),  Macbeth  (0.41),  Julius  Caesar  (0.38) 

1843 

West  Side  Story 

Midsummer  Night’s  Dream  (0.58),  Romeo  &  Juliet 
(0.57) 

2539 

MAC/FAC  results 


Probe 

Results 

Seconds 

Hamlet 

FAC:  Romeo  &  Juliet  (6.79) 

MAC:  Othello  (0.86),  Macbeth  (0.85),  Romeo  &  Juliet  (0.83), 
Julius  Caeser  (0.81) 

22 

FAC:  Romeo  &  Juliet  (16.51) 

MAC:  Romeo  &  Juliet  (0.88) 

13 

Table  13:Results  for  Hamlet,  West  Side  Story  as  probes,  Plays  database. 


MAC/FAC,  on  the  other  hand,  only  retrieves  Romeo  &  Juliet  with  either  probe.  For  West  Side  Story  this 
is  indeed  the  expected  result  (and  we  believe  more  intuitive  than  ARCS’  result),  but  what  is  happening  with 
Hamlet?  Examining  the  structural  evaluation  scores  (e.g.,  the  FAC  scores)  reveals  that  FAC  considers  the 
match  between  West  Side  Story  and  Romeo  &  Juliet  to  be  excellent  (16.51),  which  makes  sense  because 
the  encodings  of  West  Side  Story  and  Romeo  &  Juliet  have  almost  isomorphic  structure.  When  Hamlet  is 
the  probe  ,  FAC  is  relatively  indifferent,  the  FAC  scores  were:  Romeo  &  Juliet  (6.79),  Julius  Caesar 
(5.49),  Macbeth  (3.72),  Othello  (2.67).  The  dropoff  from  Romeo  &  Juliet  is  20%,  which  is  below  than 
MAC/FAC’s  default  cutoff  of  10%. 

ARCS  Results  _ _ _ _ _ _ 


Probe 

Results 

Seconds 

Hamlet 

Romeo  &  Juliet  (0.531),  King  Lear  (0.528),  Othello 
(0.45), 

Cymbeline  (0.41),  Macbeth  (0.40),  Julius  Caesar  (0.37) 

4112 

West  Side  Story 

Midsummer  Night’s  Dream  (0.58),  Romeo  &  Juliet  (0.57) 

5133 

MAC/FAC  Results 


Probe 

Results 

Seconds 

Hamlet 

FAC:  Romeo  &  Juliet  (6.79) 

MAC:  Othello  (0.86),  Macbeth  (0.85),  Romeo  &  Juliet 
(0.83), 

Julius  Caesar  (0.81),  Fable52  (0.80) 

26 

West  Side  Story 

FAC:  Romeo  &  Juliet  (16.51) 

MAC:  Romeo  &  Juliet  (0.88) 

8 

Table  14:ResuIts  for  Hamlet,  West  Side  Story  as  probes,  Plays  +  Fables  database. 
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6.6  Experiment  4:  Hawk  stories 

The  goal  of  encoding  the  Hawk  stories  was  to  replicate  the  results  of  Karla  the  Hawk  studies  described  in 
Section  2.1.1.  Thagard  et.al.  encoded  one  story  set,  and  using  the  relative  activation  levels  of  the  stories 
computed  by  ARCS  as  relative  retrieval  probabilities  for  human  subjects.  As  Section  4.4  pointed  out, 
ARCS’  order  of  retrieval  was:  literal  similarity,  first-order  overlap,  appearance,  analogy,  which  is  not  a 
close  match  to  the  observed  human  ordering  of  literal  similarity,  appearance,  analogy,  first-order  overlap. 
By  contrast,  MAC/FAC  matched  the  human  ordinal  results  in  our  simulation  of  this  experiment. 

However,  our  purpose  in  this  experiment  is  to  pursue  the  question  of  stability  of  results  under  different 
distractors.  We  ask  two  questions:  (1)  does  MAC/FAC,  using  Thagard  et.al’s  encodings,  perform 
appropriately,  and  (2)  does  changing  the  database  used  as  ARCS’  memory  change  its  predicted  outcomes? 
Both  simulations  were  run  with  the  Hawk  stories  as  probes,  and  with  the  fables  (plus  Karla  story)  as 
memory  and  with  both  fables  and  plays  (plus  the  Karla  story)  as  memory.  The  results  are  shown  in  Table 
15  and  Table  16  respectively. 


ARCS  Results 


Probe 

Results 

Seconds 

Karl,  literal  similarity 

“Karla”  base  (0.67) 

315 

Fable55  (0.4),  [7  fables],  “Karla”  base  (-0.17) 

176 

Karla,  analogy 

Fable23  (0.33),  (7  fablesl,  “Karla”  base  (-0.27) 

127 

Karla,  first-order  overlap 

Fable23  (0.0907),  Fable55  (0.0903),  [13  fables],  “Karla” 
base  (-0.11) 

17 

MAC/FAC  Results. 


Probe 

Results 

Seconds 

Karla,  Literal  Similarity 

FAC:  “Karla”  (16.07) 

MAC:  “Karla”  (0.81),  Fable71  (0.74) 

6 

Karla,  apperance 

FAC:  “Karla”  (7.92) 

MAC:  “Karla”  (0.71),  Fable52  (0.71),  Fable7 1(0.66), 
Fable27(0.65),  Fable5(0.64) 

7 

Karla,  analog 

FAC:  “Karla”  (8.57) 

MAC:  “Karla”(0.81),  Fable52  (0.77),  Fable5  (0.77), 
Fable7 1(0.76),  Fable45(0.75),  Fable59(0.75), 
Fable27(0.75) 

14 

Karla,  First-order  overlap 

FAC:  “Karla”  (5.33),  Fable5  (5.33) 

MAC:  “Karla”  (0.73),  Fable7 1(0.71),  Fable52(0.71), 
Fable5(0.71),  Fable45(0.69), 

Fable59(0.68),Fable27(0.68) 

7 

Table  15:  Results  for  HAWK  probes,  database  =  Fables  +  “Karla”  base  story 


No  matter  which  database  is  used,  MAC/FAC  always  retrieves  the  Karla  story,  irrespective  of  which 
variant  story  is  used  as  a  probe.  The  MAC  scores  explain  why:  In  each  case  the  Karla  story  is  at  the  top  of 
the  ranking,  indicating  that  the  pattern  of  identical  predicates  overlapping  is  greater  for  Karla  and  variant 
than  for  any  other  story.  The  fact  that  the  Karla  base  story  is  retrieved  for  the  literal  similarity  and 
appearance  variants  is  expected.  Its  retrieval  when  the  analogy  is  used  as  a  probe  is  also  reasonable 
(although  if  ARCS  always  retrieved  analogs  successfully  it  would  be  an  implausible  model).  Retrieving  the 
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base  story  when  the  first-order  overlap  story  is  used  as  a  probe  is  not  so  reasonable.  We  believe  this  occurs 
because  the  Thagard  et.al.  representations  are  rather  sparse  and  include  almost  no  surface  information,  and 
thus  are  less  natural  than  might  be  desired  (c.f.  the  specificity  conjecture  of  Forbus  &  Gentner,  1989). 


ARCS  Results 


Probe 

Results 

Seconds 

“Karla”,  literal 
similarity 

“Karla”  base  (0.67) 

614 

“Karla”,  appearance 

Fable55  (0.40), [16  stories],  “Karla”  base  (- 
0.018) 

408 

“Karla”,  analogy 

Pericles  (0.60),  [17  stories],  “Karla”  base  (-0.32) 

244 

“Karla”,  false  analogy 

Pericles  (0.58),  [22  stories],  “Karla”  base  (-0.38) 

45 

MAC/FAC  Results 


Probe 

Results 

Seconds 

FAC:  “Karla”(  16.07) 

MAC:  “Karla”(0.81),  Fable71  (0.74) 

7 

Karla,  apperance 

FAC:  “Karla”  (7.92), 

MAC:  “Karla”  (0.71),  Fable52(0.71),  Julius  Caesar 
(0.69), 

Othello  (0.68),  Macbeth  (0.67),  Fable7 1(0.66), 

Two  Gentlemen  of  Verona  (0.65),  Fable27(0.65), 
Hamlet  (0.65),  Fable5(0.64) 

21 

Karla,  analog 

FAC:”Karla”(8.57) 

MAC:  “Karla”  (0.81),  Julius  Caesar  (0.78), 

Two  Gentlemen  of  Verona  (0.78),  Fable52(0.77), 
Fable5(0.77),  Macbeth  (0.76), As  You  Like  It(0.76), 
Fable7 1(0.76),  Fable45(0.75),  Fable59(0.75), 
Fable27(0.75),  Othello(0.75) 

37 

Karla,  First-order 
overlap 

FAC:  “Karla”(5.33),  Fable5(5.33) 

MAC:  “Karla”(0.73),  Juilius  Caesar(0.72), 

Two  Gentlemen  of  Verona  (0.72),  Fable7 1(0.71), 
Fable52(0.71),  Fable5  (0.71),  Macbeth(0.70), 

As  You  Like  It  (0.70),  Othello  (0.69),  Fable45 
(0.69),  Hamlet(0.68) 

23 

Table  16:  Results  for  HAWK  probes,  with  database  =  Fables  +  Plays  +  “Karla”  base  story 

Interestingly,  this  experiment  marks  the  only  place  where  the  decision  to  use  functions  in  encoding  made 
any  real  difference  in  the  results.  If  no  functions  were  used  in  translating  the  ARCS  representations,  the 
MAC  results  remained  the  same  (because  content  vectors  are  based  strictly  on  identical  predicates),  but  the 
Karla  base  story  would  be  knocked  out  of  the  FAC  output  by  other  stories  that  had  more  overlapping 
structure,  since  the  causal  structure  in  the  Karla  story  could  not  be  consistently  mapped  due  to  non¬ 
identical  relations.  The  fact  that  this  problem  only  shows  up  with  this  one  probe  set,  out  of  all  the 
representations  made  by  Thagard  et.al.,  suggests  that  this  is  not  a  serious  problem. 
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As  was  suggested  by  experiments  1  and  2,  the  ARCS  results  vary  considerably  with  different  distractor 
sets.  This  means  that  the  use  of  relative  activations  to  estimate  relative  frequencies  is  not  a  stable  measure. 
Specifically,  the  relative  ordering  of  first-order  overlap  and  analogy  reverses  when  the  datbase  of  fables  is 
augmented  with  the  plays.  The  position  of  the  Karla  story  in  the  activation  rankings  is  also  alarming.  The 
appearance  story,  which  should  retrieve  the  base  almost  as  often  as  the  literal  similarity  story,  has  dropped 
from  ninth  in  the  ranking  to  18th.  Depending  on  where  the  retrieval  cutoff  is  placed,  the  conclusion  might 
be  that  ARCS  fails  to  retrieve  the  Karla  story  given  the  very  close  surface  match. 

6.7  Experiment  5:  ARCS  using  identicality 

The  results  so  far  indicate  that  MAC/FAC  is  far  more  immune  to  false  positives  than  ARCS.  What  is 
responsible  for  this  difference?  Is  it  MAC/FAC’s  use  of  a  separate  stage  that  performs  structural  filtering? 
The  use  of  content  vectors  versus  parallel  constraint  satisfaction  to  generate  an  initial  set  of  retrieval 
candidates?  MAC/FAC’s  identicality  constraint  versus  ARCS’  weaker  semantic  similarity  constraint?  A 
complete  answer  to  this  question  will  require  much  more  empirical  and  theoretical  work,  but  we  can  gain 
some  insight  by  a  simple  experiment.  We  ran  ARCS  again,  but  without  the  WordNet-inspired  similarity 
network.  Under  such  conditions,  ARCS  only  creates  local  matches  between  identical  predicates,  and  the 
initial  candidate  set  is  much  smaller,  because  the  semantic  similarity  constraint  has  been  greatly  tightened. 

The  results  of  this  experiment  are  shown  in  Tables  17-19.  Table  17  shows  that  the  results  on  Sour  grapes 
have  improved  substantially;  ARCS  is  no  longer  tempted  by  plays.  Table  18  shows  that,  while  a 
Midsummer  Night’s  Dream  is  high  on  ARCS’  list,  it  no  longer  prefers  it  to  Romeo  &  Juliet  when  West 
Side  Story  is  used  as  a  probe.  The  HAWK  results  show  the  least  improvement;  the  estimated  retrieval 
order  again  does  not  match  that  of  human  subjects,  and  there  are  still  many  fables  and  plays  ahead  of  what 
should  be  veiy  close  matches  to  the  Karla  base  story. 


ARCS  w/identicality,  SG/FAI 

1LES 

Probe 

Results 

Seconds 

Sour  Grapes(0.18) 

1.3 

Sour  Grapes  appearance 

Sour  Grapes  (0.28) 

23 

Sour  Grapes  (0.18) 

1.1 

ARCS  w/identicality,  SG/(FA 

BLES+PLAYS) 

Probe 

Results 

Seconds 

Sour  Grapes  (0.19) 

4 

Sour  Grapes  appearance 

Sour  Grapes  (0.28) 

34 

Sour  Grapes  (0.19) 

4 

Table  17:  ARCS  w/identicality  on  Sour  Grapes  with  Fables  and  Fables  +  Plays 
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ARCS  w/identicality,  database  =  plays 


Probe 

Result 

Second 

s 

Hamlet 

King  Lear  (0.56),  Romeo  &  Juliet  (0.52),  Othello  (0.47), 
Cymbeline  (0.41),  Macbeth  (0.40),  Julius  Caesar  (0.38) 

489 

1KMB 

Romeo  &  Juliet  (0.59),  Midsummer  Night’s  Dream  (0.52) 

1671 

ARCS  w/identicality,  database  =  plays+Fables 


Probe 

Result 

Second 

s 

Hamlet 

King  Lear  (0.55),  Romeo  &  Juliet  (0.51),  Othello  (0.46), 
Cymbeline  (0.49),  Macbeth  (0.39),  Julius  Caesar  (0.37) 

1108 

eBsh 

Romeo  &  Juliet  (0.59),  Midsummer  Night’s  Dream  (0.52) 

3014 

Table  18:  ARCS  w/identicality,  probed  with  Plays 


ARCS  w/identicality,  HAWK/FABLES 


Probe 

Results 

Seconds 

Karla,  Literal  Similarity 

Fable23(0.261),Fable55(0.258),Karla  story  (-0.1) 

73 

Karla,  Appearance 

Fable55(0.4),[8  fables],Karla  story  (-0.23) 

114 

Karla,  True  Analogy 

Fable23(0.26),Fable55(0.26),[5  fables],  Karla  story 
(-0.23) 

12 

Karla,  First-Order 
overlap 

Fable23(0.087),Fable55(0.087) 

5 

ARCS  w/identicality,  HAWK/(FABLES+PLAYS) 


Probe 

Results 

Seconds 

Fable23(0.26),Fable55(0.26),  Karla  story(-0.014) 

74 

Karla,  Appearance 

Fable55(0.25),Hamlet(0.17),Fable23(0.067),[17  plays  & 
fables], Karla  story (-0.22) 

154 

Karla,  True 

Analogy 

Pericles(0.55),[3  plays],Fable23(-0.13),Fable55(-0.13), 

[8  plays  &  fables], Karla  story  (-0.30) 

29 

Karla,  First-Order 
overlap 

Pericles(0.59),[6  fables  &  plays] ,Fable23(-0.25), 
Fable55(-0.25) 

18 

Table  19:  ARCS  w/identicality,  HAWK  probes 
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6.8  Conclusions  from  Computational  Comparison  Experiments 

The  results  of  cognitive  simulation  experiments  must  always  be  interpreted  with  care.  In  this  case,  we 
believe  our  experiments  provide  evidence  that  MAC/FAC,  using  structure-mapping’s  identically 
constraint,  better  models  retrieval  than  ARCS,  which  uses  Thagard  et  al.’s  notion  of  semantic  similarity. 

In  retrieval,  the  special  demands  of  large  memories  argue  for  simpler  algorithms,  simply  because  the  cost  of 
false  positives  is  much  higher.  If  retrieval  were  a  one-shot,  all-or-nothing  operation,  the  cost  of  false 
negatives  would  be  higher.  But  that  is  not  the  case.  In  normal  situations,  retrieval  is  an  iterative  process, 
interleaved  with  the  construction  of  the  representations  being  used.  Thus  the  cost  of  false  negatives  is 
reduced  by  the  chance  that  reformulation  of  the  probe,  due  to  rerepresentation  and  inference,  will 
subsequently  catch  a  relevant  memory  that  slipped  by  once. 

While  both  are  designed  to  allow  parallel  implementations,  the  two  orders  of  magnitude  speed  difference 
clearly  suggests  that  MAC/FAC  is  the  more  practical  choice  for  cognitive  simulation  experiments. 

Finally,  we  note  that  while  ARCS’  use  of  a  localist  connectionist  network  to  implement  constraint 
satisfaction  is  in  many  ways  intuitively  appealing,  it  is  by  no  means  clear  that  such  implementations  are 
neurally  plausible.  On  the  other  hand,  we  believe  the  evidence  suggests  that  MAC/FAC  captures 
similarity-based  retrieval  phenomena  better  than  ARCS  does. 


7.  Discussion 

Similarity  is  central  in  transfer.  To  understand  its  role  requires  making  fine  distinctions  both  about 
similarity  and  about  transfer.  The  psychological  evidence  indicates  that  the  accessibility  of  matches  from 
memory  is  strongly  influenced  by  surface  commonalities  and  weakly  influenced  by  structural 
commonalities,  while  the  rated  inferential  soundness  of  comparisons  is  strongly  influenced  by  structural 
commonalities  and  little,  if  at  all,  influenced  by  surface  commonalities.  An  account  of  similarity  in  transfer 
must  deal  with  this  disassociation  between  retrieval  and  structural  alignment:  between  the  matches  people 
get  from  memory  and  the  matches  they  want. 

The  MAC/FAC  model  of  similarity-based  retrieval  captures  both  the  fact  that  humans  successfully  store 
and  retrieve  intricate  relational  structures  and  the  fact  that  access  to  these  stored  structures  is  heavily 
(though  not  entirely)  surface-driven.  The  first  stage  is  attentive  to  content  and  blind  to  structure  and  the 
second  stage  is  attentive  to  both  content  and  structure.  The  MAC  stage  uses  content  vectors,  a  novel 
summary  of  structured  representations,  to  provide  an  inexpensive  “wide  net”  search  of  memory,  whose 
results  are  pruned  by  the  more  expensive  literal  similarity  matcher  of  the  FAC  stage  to  arrive  at  useful, 
structurally  sound  matches. 

The  simulation  results  presented  here  demonstrate  that  MAC/FAC  can  simulate  the  patterns  of  access 
exhibited  by  human  subjects.  It  displays  the  appropriate  preponderance  of  literal  similarity  and  surface 
matches,  and  it  occasionally  retrieves  purely  relational  matches  (Section  4).  Our  sensitivity  studies  suggest 
that  these  results  are  a  consequence  of  our  theory,  and  are  not  hostage  to  non-theoretically  motivated 
parameters  or  algorithmic  choices  (Section  5).  Our  computational  experiments  comparing  MAC/FAC  and 
ARCS  (Section  6)  suggests  that  MAC/FAC  both  accounts  for  the  psychological  results  more  accurately 
and  more  robustly  than  ARCS.  In  addition  to  the  experiments  reported  here,  we  have  tested  MAC/FAC  on 
a  variety  of  other  data  sets,  including  relational  metaphors  (30  descriptions,  average  of  12  propositions 
each)  and  attribute-rich  descriptions  of  physical  situations  as  might  be  found  in  commonsense  reasoning 
(12  descriptions,  averaging  42  propositions  each).  We  have  also  tried  various  combinations  of  these 
databases  with  the  Karla  the  Hawk  data  set  (45  descriptions,  averaging  67  propositions  each).  In  all  cases 
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to  date  MAC/FAC's  performance  has  been  satisfactory  and  consistent  with  the  overall  pattern  of  findings 
regarding  human  retrieval. 

However,  there  are  still  many  open  issues  regarding  access  and  MAC/FAC.  The  rest  of  this  section 
examines  the  most  important  of  these,  compares  MAC/FAC  to  other  directly  relevant  models,  and  explores 
some  broader  implications. 


7.1  Limitations  and  open  questions 

7.1.1  Retrieval  failure 

Sometimes  a  probe  reminds  us  of  nothing.  Currently  the  only  way  this  can  happen  in  the  MAC/FAC  model 
is  for  FAC  to  reject  every  candidate  provided  by  MAC.  This  can  happen  if  no  structurally  sound  match 
hypotheses  can  be  generated  between  the  probe  and  the  descriptions  output  by  MAC.  (Without  any  local 
correspondences  there  can  be  no  interpretation  of  the  comparison.)  This  can  happen,  albeit  rarely.  A 
variant  of  MAC/FAC  with  thresholds  on  the  output  of  either  or  the  both  MAC  and  FAC  stages  —  so  that 
the  system  would  return  nothing  if  the  best  match  were  below  criterion  —  would  show  more  non¬ 
remindings. 


7.1.2  Focused  remindings  and  penetrability 

Many  AI  retrieval  programs  and  cognitive  simulations  elevate  the  reasoner's  current  goals  to  a  central  role 
in  their  theoretical  accounts  (e.g.,  Hammond,  1986, 1989;  Keane,  1988ab;  Kolodner,  1984, 1989;  Riesbeck 
&  Schank,  1989;  Thagard,  Holyoak,  Nelson,  &  Gochfeld,  1990).  Although  we  agree  with  the  claim  that 
goal  structures  are  important,  MAC/FAC  does  not  give  goals  a  separate  status  in  retrieval.  Rather,  we 
assume  that  the  person's  current  goals  are  represented  as  part  of  the  higher-order  structure  of  the  probe. 

The  assumption  is  that  goals  are  embedded  in  a  relational  structure  linking  them  to  the  rest  of  the  situation; 
they  play  a  role  in  retrieval,  but  the  rest  of  the  situational  factors  must  participate  as  well.  When  one  is 
hungry,  for  instance,  presumably  the  ways  of  getting  food  that  come  to  mind  are  different  if  one  is  standing 
in  a  restaurant,  a  supermarket,  or  in  the  middle  of  a  forest.  The  inclusion  of  current  goals  as  part  of  the 
representation  of  the  probe  is  consistent  with  the  finding  of  Read  and  Cesa  (1991)  that  asking  subjects  for 
explanations  of  current  scenarios  leads  to  a  relatively  high  rate  of  analogical  reminding.  However,  we  see 
no  reason  to  elevate  goals  above  other  kinds  of  higher-order  structure.  By  treating  goals  as  just  one  of 
many  kinds  of  higher-order  structures,  we  escape  making  the  erroneous  prediction  of  many  case-based 
reasoning  systems;  that  retrieval  requires  common  goals.  People  can  retrieve  information  that  was 
originally  stored  under  different  goal  structures.  (See  Goldstein,  Kedar,  &  Bareiss,  1993,  for  a  discussion 
of  this  point.) 

A  related  question  concerns  the  degree  to  which  the  results  of  each  stage  are  inspectable  and  tunable.  We 
assume  that  the  results  of  the  FAC  stage  are  inspectable,  but  that  explicit  awareness  of  the  results  of  the 
MAC  stage  is  lacking.  We  conjecture  that  one  can  get  a  sense  that  there  are  possible  matches  in  the  MAC 
output,  and  perhaps  some  impression  of  how  strong  the  matches  are,  but  not  what  those  items  are.  The 
feeling  of  being  reminded  without  being  able  to  remember  the  actual  item  might  correspond  to  having 
candidates  generated  by  MAC  which  are  all  either  too  weak  to  pass  on  or  are  rejected  by  the  FAC  stage. 

How  much  can  MAC  and  FAC  be  affected  by  the  subject?  There  is  psychological  evidence  that  people 
cannot  directly  control  the  kinds  of  matches  they  retrieve.  Schumacher  and  Gentner  (in  preparation) 
investigated  this  by  varying  the  test  instructions  given  to  subjects.  They  gave  subjects  lists  of  proverbs  to 
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read,  followed  by  test  proverbs  that  were  either  structurally  similar  or  surface-similar  to  proverbs  studied 
previously.  Subjects  who  were  told  to  write  any  prior  proverbs  that  they  were  reminded  of  while  reading 
the  test  proverbs  recalled  about  twice  as  many  surface  matches  as  analogies.  Another  group  of  subjects 
was  told  to  write  only  structural  remindings  and  to  strive  for  as  many  of  these  as  possible.  Although  these 
subjects  indeed  wrote  many  fewer  surface  matches  than  the  first  group,  they  recalled  only  the  same  low 
number  of  analogies.  The  goal  to  seek  relational  matches  apparently  led  people  to  filter  non-relational 
matches,  but  not  to  find  more  relational  matches.  This  suggests  that  the  FAC  matcher  may  be  tunable  (in 
that  subjects  were  able  to  filter  out  the  surface  matches)  but  not  the  MAC  matcher  (in  that  subjects  were 
not  able  to  produce  more  analogies  on  demand). 

The  idea  that  FAC,  though  not  MAC,  is  tunable  is  consistent  with  evidence  that  people  can  be  selective  in 
similarity  matching  once  both  members  of  a  pair  are  present.  For  example,  in  a  triads  task,  matching  XX 
to  00  or  XO,  subjects  can  readily  choose  either  only  relational  matches  (XX  --  00)  or  only  surface 
matches  (XX  --  XO)  (Gentner  &  Markman,  1994a,b;  Goldstone,  Medin  &  Gentner,  1991;  Medin, 
Goldstone  &  Gentner,  1993).  This  could  be  modeled  by  assuming  either  that  the  FAC  matcher  is  tuned 
according  to  such  task  constraints,  or  that  there  is  a  separate  tunable  similarity  processor  for  comparing 
items  within  working  memory. 


7.1.3  Size  of  content  vectors 

One  potential  problem  with  scaling  up  MAC/FAC  is  the  potential  growth  in  the  size  of  content  vectors. 

Our  current  descriptions  use  a  vocabulary  of  only  a  few  hundred  distinct  predicates.  We  implement  content 
vectors  via  sparse  encoding  techniques,  analogous  to  those  used  in  computational  matrix  algebra,  for 
efficiency.  However,  a  psychologically  plausible  representation  vocabulary  may  have  hundreds  of 
thousands  of  predicates.  It  is  not  obvious  that  our  sparse  encoding  techniques  will  suffice  for  vocabularies 
that  large,  nor  does  this  implementation  address  the  question  of  how  systems  with  limited  “hardware 
bandwidth”,  such  as  connectionist  implementations,  could  serve  as  a  substrate  for  this  model. 

This  scale  problem  is  mitigated  partly  by  MAC/FAC’s  basic  architecture  with  its  cheap  initial  filter. 
However  there  are  at  least  two  further  possible  ways  to  address  the  potential  scale  problem  in  the  size  of 
content  vectors.  The  first  is  abstraction.  In  symbolic  knowledge  representations,  predicates  and  functions 
are  often  arranged  in  hierarchies.  For  example,  a  complex  concept  such  as  bequeath  might  be  stored  as  a 
specialization  of  the  concept  of  giving,  which  might  in  turn  be  a  specialization  of  the  concept  of  transfer. 
Let  us  view  the  set  of  specializations  between  predicates  as  a  lattice.  Any  set  of  predicates  that  partitions 
the  lattice  can  be  used  to  formulate  a  semantically  compressed  content  vector  as  follows:  The  weight  of  a 
component  of  the  compressed  content  vector  is  a  function  of  the  number  of  occurrences  of  that  predicate  — 
and  all  predicates  below  it  in  the  partition  —  in  the  description.  In  effect,  predicates  below  the  selected 
subsets  are  replaced  with  more  abstract  versions.  Another  possible  solution  for  the  scale  problem  is 
factorization :  the  predicates  could  be  partitioned  into  subsets  that  are  tightly  interrelated  and  separate 
content  vectors  computed  for  each  subset.  This  organization  presumes  that  there  is  some  fixed  size  bound 
on  processing  modules,  but  that  several  processing  modules  can  be  synchronized  well  enough  to 
accumulate  results  across  them. 


7.1.4  Combining  similarity  effects  across  items 

MAC/FAC  is  currently  a  purely  exemplar-based  memory  system.  The  memory  items  can  be  highly 
situation-specific  encodings  of  perceptual  stimuli,  abstract  mathematical  descriptions,  causal  scenarios,  etc, 
MAC/FAC  lacks  the  capacity  to  inter-item  effects.  For  example,  MAC/FAC  does  not  capture  competition 
among  items.  Wharton,  Holyoak,  Downing,  Lange,  and  Wickens  (1991, 1992)  and  Wharton,  Holyoak, 
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Downing,  Lange,  Wickens,  and  Melz  (1994)  have  shown  an  intriguing  effect  where  competition  between 
exemplars  heightens  the  relative  effect  of  structural  similarity  in  retrieval.  MAC/FAC  also  does  not 
average  across  several  items  at  retrieval  (Medin  &  Schaffer,  1978)  or  derive  a  global  sense  of  familiarity 
by  combining  the  activations  of  multiple  retrievals  (Gillund  &  Shiffrin,  1984;  Hintzman,  1986, 1988).  An 
interesting  extension  of  MAC/FAC  would  be  to  include  this  kind  of  between-item  processing  upon 
retrieval. 

If  such  inter-item  averaging  occurs,  it  could  provide  a  route  to  the  incremental  construction  of  abstractions 
and  indexing  information  in  memory.  We  see  three  plausible  ways  to  do  this.  First,  as  above,  the 
descriptions  output  by  the  MAC  stage  could  be  compared.  Second,  the  access  system  might  incrementally 
build  up  something  like  Minsky's  (1981)  similarity  network,  using  the  history  of  retrievals  to  encode 
difference  descriptions  to  simplify  future  access.  Third,  the  descriptions  output  by  the  FAC  stage  could  be 
compared:  SME  could  be  used  to  carry  out  structural  abstraction  across  several  descriptions  (as  in 
Skorstad,  Medin,  &  Gentner,  1988)  to  produce  a  combined  description  as  the  FAC  output.  The  first  and 
third  models  are  both  forms  of  “late  averaging”  accounts,  and  it  would  be  interesting  to  compare  these 
techniques  with  other  models  that  account  for  prototype  effects  by  combining  exemplars  at  retrieval 
(Hintzman,  1986, 1988;  Medin  &  Shaffer,  1978). 


7.1.5  Iterative  access 

Keane  (1988c,  1991;  Keane  &  Brayshaw,  1988)  and  Burstein  (1983a,b)  have  proposed  incremental 
mapping  processes.  We  suggest  that  similarity-based  retrieval  may  also  be  an  iterative  process.  In 
particular,  in  active  retrieval  (as  opposed  to  spontaneous  remindings)  we  conjecture  that  MAC/FAC  may 
be  used  iteratively,  each  time  modifying  the  probe  in  response  to  the  previous  match  (c.f.  Falkenhainer 
1987, 1990;  Gentner  1989).  Suppose  for  example  a  probe  yielded  several  partial  remindings.  The  system 
of  matches  could  provide  clues  as  to  which  aspects  of  the  probe  are  more  or  less  relevant,  and  thus  should 
be  highlighted  or  suppressed  on  the  next  iteration.  MAC  should  respond  to  this  altered  vector  by  returning 
more  relevant  items,  and  FAC  can  then  select  the  best  of  these. 

Another  advantage  of  such  incremental  reminding  is  that  it  might  help  explain  how  we  derive  new  relational 
categories.  Barsalou's  (1982,  1987)  ad  hoc  categories,  such  as  “things  to  take  on  a  picnic”  and  Glucksberg 
and  Keysar's  (1990)  metaphorically  based  categories,  such  as  “jail”  as  a  prototypical  confining  institution, 
are  examples  of  the  kinds  of  abstract  relational  commonalities  that  might  be  highlighted  during  a  process  of 
incremental  retrieval  and  mapping. 


7.1.6  Embedding  in  performance-oriented  models 

MAC/FAC  is  not  itself  a  complete  analogical  processing  system.  For  example,  both  constructing  a  model 
from  multiple  analogs  (e.g.,  Burstein,  1983a,b)  and  learning  a  domain  theory  by  analogy  (e.g., 
Falkenhainer  1987, 1988,  1990b)  require  multiple  iterations  of  accessing,  mapping,  and  evaluating 
descriptions.  Several  psychological  questions  about  access  cannot  be  studied  without  embedding 
MAC/FAC  in  a  more  comprehensive  model  of  analogical  processing.  First,  as  discussed  above,  there  is 
ample  evidence  that  subjects  can  choose  to  focus  on  different  kinds  of  similarity  when  the  items  being 
compared  are  both  already  in  working  memory.  Embedding  MAC/FAC  in  a  larger  system  should  help 
make  clear  whether  this  penetrability  should  be  modeled  as  applying  to  the  FAC  system  or  to  a  separate 
similarity  engine.  (Order  effects  in  analogical  problem  solving  (Keane,  in  press)  suggest  the  latter.) 

A  second  issue  that  requires  a  larger,  performance-oriented  model  to  explore  via  simulation  is  when  and 
how  pragmatic  constraints  should  be  incorporated  (c.f.  Holyoak  &  Thagard,  1989;  Thagard  et  al,  1990). 
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Since  we  assume  that  goals,  plans,  and  similar  control  knowledge  is  explicitly  represented  in  working 
memory,  the  MAC  stage  will  include  such  predicates  in  the  content  vector  for  the  probe,  and  hence  will  be 
influenced  by  pragmatic  concerns.  There  are  two  ways  to  model  the  effects  of  pragmatics  on  the  FAC 
stage.  The  first  is  to  use  the  SME  pragmatic  marking  algorithm  (Forbus  and  Oblinger,  1990)  as  a 
relevance  filter.  The  second  is  to  use  incremental  mapping,  as  in  Keane  &  Bray  shaw’s  (1988)  Incremental 
Analogy  Machine  (LAM).  This  technique  permits  the  selection  and  grouping  of  sets  of  correspondences  to 
be  influenced  by  the  task  at  hand  (Forbus,  Ferguson,  &  Gentner,  1994). 

A  recent  simulation  by  Lange  and  Wharton  (1992,  1993)  called  REMIND  models  retrieval  in  the  context  of 
natural  language  processing,  using  spreading  activation  in  a  connectionist  network  both  to  construct  a 
conceptual  representation  from  textual  input  and  to  find  the  most  similar  story  in  its  episodic  memory. 
REMIND  is  an  intriguing  model,  and  the  attempt  to  integrate  multiple  cognitive  processes  into  larger 
models  is  an  important  activity.  However,  it  is  difficult  to  compare  this  model  with  MAC/FAC  and  other 
retrieval  models.  First,  REMIND  only  models  a  specific  retrieval  task,  namely  retrieval  in  the  service  of 
understanding  stories,  and  thus  does  not  attempt  to  cover  as  wide  a  span  of  phenomena  as  MAC/FAC. 
Second,  when  REMIND  retrieves  a  story,  it  does  not  appear  to  create  correspondences  between  the 
understanding  of  its  input  and  the  previous  story,  nor  does  it  generate  novel  candidate  inferences,  and  thus 
does  not  satisfy  the  structured  mappings  criterion  for  retrieval.  Third,  REMIND  has  only  been  tested  on  a 
corpus  involving  a  handful  of  short  (i.e.,  two  sentence)  stories.  To  our  knowledge,  it  has  never  been  tested 
either  on  a  corpus  as  large  as  those  used  with  MAC/FAC  and  ARCS,  nor  on  a  corpus  that  includes 
examples  as  large  as  those  used  with  MAC/FAC.  Even  their  current  small  databases  stretch  the  limits  of  a 
Connection  Machine,16  which  makes  it  difficult  to  evaluate  their  model  thoroughly. 


7.1.7  Expertise  and  Relational  Access 

Despite  the  gloomy  picture  painted  in  the  present  research  and  in  most  of  the  problem-solving  research, 
there  is  evidence  of  considerable  relational  access  (a)  for  experts  in  a  domain  and  (b)  when  initial  encoding 
of  the  study  set  is  relatively  intensive.  Novick  (1988a,b)  studied  remindings  for  mathematics  problems 
using  novice  and  expert  mathematics  students.  She  found  that  experts  were  more  likely  than  novices  to 
retrieve  a  structurally  similar  prior  problem,  and  when  they  did  retrieve  a  surface-similar  problem,  they 
were  quicker  to  reject  it  than  were  novices.  Faries  and  Reiser  (1988)  taught  subjects  LISP  in  a  series  of 
intensive  training  sessions  and  then  gave  them  target  problems  that  were  superficially  similar  to  one  prior 
problem  and  structurally  similar  to  another.  Given  this  intensive  training,  Faries  and  Reiser’s  subjects  were 
able  to  access  structurally  similar  problems  despite  the  competing  superficial  similarities. 

The  second  contributor  to  relational  retrieval,  almost  certainly  related  to  the  first,  is  intensive  encoding. 
Gick  and  Holyoak  (1983)  and  Catrambone  and  Holyoak  (1987,  1989)  found  that  subjects  exhibited 
increased  relational  retrieval  when  they  were  required  to  compare  two  prior  analogs,  but  not  when  they 
were  simply  given  two  prior  analogs.  Schumacher  &  Gentner  (1987)  found  increased  relational  retrieval  of 
proverbs  when  subjects  wrote  out  the  meaning  of  each  proverb  on  the  study  list,  as  opposed  to  simply 
reading  it  or  rating  its  cleverness.  Seifert,  McKoon,  Abelson,  and  Ratcliff  (1986)  investigated  priming 
effects  in  a  sentence  verification  task  between  thematically  similar  (analogical)  stories.  They  obtained 
priming  when  subjects  first  studied  a  list  of  themes  and  then  judged  the  thematic  similarity  of  pairs  of 
stories,  but  not  when  subjects  simply  read  the  stories. 

The  increase  of  relational  reminding  with  expertise  and  with  intensive  encoding  can  be  accommodated  in 
the  MAC/FAC  model.  First,  we  assume  that  experts  have  richer  and  better  structured  representations  of  the 


16 Trent  Lange,  personal  communication,  UCAI-93. 
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relations  in  the  content  domain  than  do  novices  (Carey,  1985;  Chi,  1978;  Reed,  Ackinclose,  &  Voss,  1990). 
This  fits  with  developmental  evidence  that  as  children  come  to  notice  and  encode  higher-order  relations 
such  as  symmetry  and  monotonicity,  their  appreciation  of  abstract  similarity  increases  (Kotovsky  & 
Gentner,  1990;  Gentner  &  Rattermann,  1991).  Second,  in  particular  we  speculate  that  experts  may  have  a 
more  uniform  internal  relational  vocabulary  within  the  domain  of  expertise  than  do  novices  (Clement, 
Mawby  &  Giles,  1994;  Gentner  &  Rattermann,  1991;  Gentner,  Rattermann,  Markman,  &  Kotovsky,  in 
press).  The  idea  is  that  experts  tend  to  have  relatively  comprehensive  theories  in  a  domain  and  that  this 
promotes  canonical  relational  encodings  within  the  domain. 

To  the  extent  that  a  given  higher-order  relational  pattern  is  used  to  encode  a  given  situation,  it  will  of 
course  be  automatically  incorporated  into  MAC/FAC's  content  vector.  This  means  that  any  higher-order 
relational  concept  that  is  widely  used  in  a  domain  will  tend  to  increase  the  uniformity  of  the  representations 
in  memory.  This  increased  uniformity  should  increase  the  mutual  accessibility  of  situations  within  the 
domain.  Thus  as  experts  come  to  encode  a  domain  according  to  a  uniform  set  of  principles,  the  likelihood 
of  appropriate  relational  remindings  increases.  That  is,  under  the  MAC/FAC  model,  the  differences  in 
retrieval  patterns  for  novices  and  experts  are  explained  in  terms  of  differences  in  knowledge,  rather  than  by 
the  construction  of  explicit  indices. 

Bassok  has  made  an  interesting  argument  that  indirectly  supports  this  claim  of  greater  relational  uniformity 
for  experts  than  for  novices  (Bassok  &  Wu,  in  preparation).  Noting  prior  findings  that  in  forming 
representations  of  novel  texts  people's  interpretations  of  verbs  depend  on  the  nouns  attached  to  them 
(Gentner,  1981;  Gentner  &  France,  1988),  Bassok  suggests  that  particular  representations  of  the  relational 
structure  may  thus  be  idiosyncratically  related  to  the  surface  content,  and  that  this  is  one  contributor  to  the 
poor  relational  access.  If  this  is  true,  and  if  we  are  correct  in  our  supposition  that  experts  tend  to  have  a 
relatively  uniform  relational  vocabulary,  then  an  advantage  for  experts  in  relational  access  would  be 
predicted. 

As  domain  expertise  increases,  MAC/FAC's  activity  may  come  to  resemble  a  multi-goal  case-based 
reasoning  model.  We  can  think  of  its  content  vectors  as  indices  with  the  property  that  they  change 
automatically  with  any  change  in  the  representation  of  domain  exemplars.  Thus  as  domain  knowledge  -- 
particularly  the  higher-order  relational  vocabulary  --  increases,  MAC/FAC  may  come  to  have  sufficiently 
elaborated  representations  to  permit  a  fairly  high  proportion  of  relational  remindings.  The  case-based 
reasoning  emphasis  on  retrieving  prior  examples  and  generalizations  that  are  inferentially  useful  may  thus 
be  a  reasonable  approximation  to  the  way  experts  retrieve  knowledge. 


7.1.8  Comparison  with  recent  approaches 


Retrieval  models  tend  to  be  of  two  types:  Cognitive  simulations  that  are  committed  to  modeling  human 
performance  in  detail,  and  case-based  reasoning  systems  that,  while  inspired  by  human  cognition,  are 
equally  concerned  with  effective  performance  on  practical  problems.  Of  recent  cognitive  simulations,  the 
closest  is  the  REMIND  model  of  Lange  and  Wharton  (1992,  1993),  discussed  in  Section  7.1.6.  Of  the 
case-based  reasoning  systems,  one  of  closest  to  our  approach  seems  to  be  the  CaPER  system  (Kettler, 
Hendler,  and  Anderson,  1992).  CaPER  is  designed  to  retrieve  all  sufficiently  similar  plans  from  an 
unindexed  case-base,  using  massive  parallelism  to  do  a  simple,  non-structural  match  between  a  query  and 
the  contents  of  memory.  It  would  be  very  interesting  how  well  the  parallel  techniques  used  in  CaPER  could 
be  applied  to  MAC/FAC. 
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7.2  The  Decomposition  of  Similarity 

The  dissociation  between  surface  similarity  and  structural  similarity  across  different  processes  has  broader 
implications  for  cognition,  and  is  related  to  several  recent  discussions.  Medin,  Goldstone  and  Gentner 
(1993)  and  Gentner  (1989)  have  argued  that  similarity  is  pluralistic,  in  the  sense  that  there  are  multiple 
subclasses  of  similarity  and  multiple  influences  on  how  it  is  computed.  Rips  (1989)  demonstrated  a 
dissociation  between  similarity,  typicality,  and  categorization.  Murphy  and  Medin  (1985)  and  Keil  (1989) 
have  commented  on  the  limited  usefulness  of  simple  similarity  and  pointed  out  that  physical  resemblance 
does  not  provide  a  sufficient  basis  for  determining  conceptual  groupings.  As  discussed  above,  there  is  a 
relational  shift  in  development  (Gentner  &  Rattermann,  1991;  Halford,  1992).  Finally,  local  object  matches 
appear  to  be  processed  faster  by  adults  than  structural  commonalities.  Goldstone  and  Medin  (1994)  found 
that  local  similarities  have  their  effects  on  mapping  earlier  than  global  relational  similarities  in  a  timed 
mapping  task,  and  Ratcliff  and  McKoon  (1989)  found  convergent  results  in  a  sentence-matching  task: 
subjects  could  discriminate  new  from  old  sentences  faster  if  the  new  sentences  contained  all  new  words 
(e.g.,  “Helen  attracted  Jeff.”,  vs.  “Andrew  accosted  Mary.”)  than  if  the  sentences  differed  only  in  relational 
structure  (e.g.,  “Helen  attracted  Jeff.”  vs.  “Jeff  attracted  Helen.”).  In  pilot  experiments  using  perceptual 
stimuli,  in  which  subjects  were  timed  under  different  kinds  of  mapping  instructions,  Markman  and  Gentner 
(in  press)  have  found  that  subjects  are  faster  to  choose  on  the  basis  of  similar  objects  than  on  the  basis  of 
similar  relations,  even  when  the  two  rules  dictate  the  same  response. 

These  kinds  of  results  render  less  plausible  the  notion  of  a  unitary  similarity  that  governs  retrieval, 
evaluation  and  inference.  Instead,  they  suggest  a  more  complex,  pluralistic  view  of  similarity.  MAC/FAC 
provides  an  architecture  which  demonstrates  how  such  a  pluralistic  notion  of  similarity  can  be  organized  to 
account  for  psychological  data  on  retrieval. 
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